Some jobs upon finalization may need to perform some work that can
still fail. If these jobs are part of a transaction, it's important
that these callbacks fail the entire transaction.
We allow for a new callback in addition to commit/abort/clean that
allows us the opportunity to have fairly late-breaking failures
in the transactional process.
The expected flow is:
- All jobs in a transaction converge to the WAITING state
(added in a forthcoming commit)
- All jobs prepare to call either commit/abort
- If any job fails, is canceled, or fails preparation, all jobs
call their .abort callback.
- All jobs enter the PENDING state, awaiting manual intervention
(also added in a forthcoming commit)
- block-job-finalize is issued by the user/management layer
- All jobs call their commit callbacks.
Signed-off-by: John Snow <jsnow@redhat.com>
---
blockjob.c | 34 +++++++++++++++++++++++++++++++---
include/block/blockjob_int.h | 10 ++++++++++
2 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/blockjob.c b/blockjob.c
index 8f02c03880..1c010ec100 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -394,6 +394,18 @@ static void block_job_update_rc(BlockJob *job)
}
}
+static int block_job_prepare(BlockJob *job)
+{
+ if (job->ret) {
+ goto out;
+ }
+ if (job->driver->prepare) {
+ job->ret = job->driver->prepare(job);
+ }
+ out:
+ return job->ret;
+}
+
static void block_job_commit(BlockJob *job)
{
assert(!job->ret);
@@ -417,7 +429,7 @@ static void block_job_clean(BlockJob *job)
}
}
-static void block_job_completed_single(BlockJob *job)
+static int block_job_completed_single(BlockJob *job)
{
assert(job->completed);
@@ -452,6 +464,7 @@ static void block_job_completed_single(BlockJob *job)
block_job_txn_unref(job->txn);
block_job_event_concluded(job);
block_job_unref(job);
+ return 0;
}
static void block_job_cancel_async(BlockJob *job)
@@ -467,17 +480,22 @@ static void block_job_cancel_async(BlockJob *job)
job->cancelled = true;
}
-static void block_job_txn_apply(BlockJobTxn *txn, void fn(BlockJob *))
+static int block_job_txn_apply(BlockJobTxn *txn, int fn(BlockJob *))
{
AioContext *ctx;
BlockJob *job, *next;
+ int rc;
QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
ctx = blk_get_aio_context(job->blk);
aio_context_acquire(ctx);
- fn(job);
+ rc = fn(job);
aio_context_release(ctx);
+ if (rc) {
+ break;
+ }
}
+ return rc;
}
static void block_job_do_dismiss(BlockJob *job)
@@ -567,6 +585,8 @@ static void block_job_completed_txn_success(BlockJob *job)
{
BlockJobTxn *txn = job->txn;
BlockJob *other_job;
+ int rc = 0;
+
/*
* Successful completion, see if there are other running jobs in this
* txn.
@@ -576,6 +596,14 @@ static void block_job_completed_txn_success(BlockJob *job)
return;
}
}
+
+ /* Jobs may require some prep-work to complete without failure */
+ rc = block_job_txn_apply(txn, block_job_prepare);
+ if (rc) {
+ block_job_completed_txn_abort(job);
+ return;
+ }
+
/* We are the last completed job, commit the transaction. */
block_job_txn_apply(txn, block_job_completed_single);
}
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 259d49b32a..642adce68b 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -53,6 +53,16 @@ struct BlockJobDriver {
*/
void (*complete)(BlockJob *job, Error **errp);
+ /**
+ * If the callback is not NULL, prepare will be invoked when all the jobs
+ * belonging to the same transaction complete; or upon this job's completion
+ * if it is not in a transaction.
+ *
+ * This callback will not be invoked if the job has already failed.
+ * If it fails, abort and then clean will be called.
+ */
+ int (*prepare)(BlockJob *job);
+
/**
* If the callback is not NULL, it will be invoked when all the jobs
* belonging to the same transaction complete; or upon this job's
--
2.14.3
On 02/23/2018 05:51 PM, John Snow wrote: > Some jobs upon finalization may need to perform some work that can > still fail. If these jobs are part of a transaction, it's important > that these callbacks fail the entire transaction. > > We allow for a new callback in addition to commit/abort/clean that > allows us the opportunity to have fairly late-breaking failures > in the transactional process. > > The expected flow is: > > - All jobs in a transaction converge to the WAITING state > (added in a forthcoming commit) > - All jobs prepare to call either commit/abort > - If any job fails, is canceled, or fails preparation, all jobs > call their .abort callback. > - All jobs enter the PENDING state, awaiting manual intervention > (also added in a forthcoming commit) > - block-job-finalize is issued by the user/management layer > - All jobs call their commit callbacks. > > Signed-off-by: John Snow <jsnow@redhat.com> > --- > blockjob.c | 34 +++++++++++++++++++++++++++++++--- > include/block/blockjob_int.h | 10 ++++++++++ > 2 files changed, 41 insertions(+), 3 deletions(-) > > @@ -467,17 +480,22 @@ static void block_job_cancel_async(BlockJob *job) > job->cancelled = true; > } > > -static void block_job_txn_apply(BlockJobTxn *txn, void fn(BlockJob *)) > +static int block_job_txn_apply(BlockJobTxn *txn, int fn(BlockJob *)) > { > AioContext *ctx; > BlockJob *job, *next; > + int rc; > > QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) { > ctx = blk_get_aio_context(job->blk); > aio_context_acquire(ctx); > - fn(job); > + rc = fn(job); > aio_context_release(ctx); > + if (rc) { > + break; > + } This short-circuits the application of the function to the rest of the group. Is that ever going to be a problem? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
On 02/27/2018 02:56 PM, Eric Blake wrote: > On 02/23/2018 05:51 PM, John Snow wrote: >> Some jobs upon finalization may need to perform some work that can >> still fail. If these jobs are part of a transaction, it's important >> that these callbacks fail the entire transaction. >> >> We allow for a new callback in addition to commit/abort/clean that >> allows us the opportunity to have fairly late-breaking failures >> in the transactional process. >> >> The expected flow is: >> >> - All jobs in a transaction converge to the WAITING state >> (added in a forthcoming commit) >> - All jobs prepare to call either commit/abort >> - If any job fails, is canceled, or fails preparation, all jobs >> call their .abort callback. >> - All jobs enter the PENDING state, awaiting manual intervention >> (also added in a forthcoming commit) >> - block-job-finalize is issued by the user/management layer >> - All jobs call their commit callbacks. >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> --- >> blockjob.c | 34 +++++++++++++++++++++++++++++++--- >> include/block/blockjob_int.h | 10 ++++++++++ >> 2 files changed, 41 insertions(+), 3 deletions(-) >> > >> @@ -467,17 +480,22 @@ static void block_job_cancel_async(BlockJob *job) >> job->cancelled = true; >> } >> -static void block_job_txn_apply(BlockJobTxn *txn, void fn(BlockJob *)) >> +static int block_job_txn_apply(BlockJobTxn *txn, int fn(BlockJob *)) >> { >> AioContext *ctx; >> BlockJob *job, *next; >> + int rc; >> QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) { >> ctx = blk_get_aio_context(job->blk); >> aio_context_acquire(ctx); >> - fn(job); >> + rc = fn(job); >> aio_context_release(ctx); >> + if (rc) { >> + break; >> + } > > This short-circuits the application of the function to the rest of the > group. Is that ever going to be a problem? > With what I've written, I don't think so -- but I can't guarantee someone won't misunderstand the semantics of it and it will become a problem. It is a potentially dangerous function in that way. --js
Am 24.02.2018 um 00:51 hat John Snow geschrieben: > Some jobs upon finalization may need to perform some work that can > still fail. If these jobs are part of a transaction, it's important > that these callbacks fail the entire transaction. > > We allow for a new callback in addition to commit/abort/clean that > allows us the opportunity to have fairly late-breaking failures > in the transactional process. > > The expected flow is: > > - All jobs in a transaction converge to the WAITING state > (added in a forthcoming commit) > - All jobs prepare to call either commit/abort > - If any job fails, is canceled, or fails preparation, all jobs > call their .abort callback. > - All jobs enter the PENDING state, awaiting manual intervention > (also added in a forthcoming commit) > - block-job-finalize is issued by the user/management layer > - All jobs call their commit callbacks. > > Signed-off-by: John Snow <jsnow@redhat.com> You almost made me believe the scary thought that we need transactional graph modifications, but after writing half of the reply, I think it's just that your order here is wrong. So .prepare is the last thing in the whole process that is allowed to fail. Graph manipulations such as bdrv_replace_node() can fail. Graph manipulations can also only be made in response to block-job-finalize because the management layer must be aware of them. Take them together and you have a problem. Didn't we already establish earlier that .prepare/.commit/.abort must be called together and cannot be separated by waiting for a QMP command because of locking and things? So if you go to PENDING first, then wait for block-job-finalize and only then call .prepare/.commit/.abort, we should be okay for both problems. And taking a look at the final state, that seems to be what you do, so in the end, it's probably just the commit message that needs a fix. > blockjob.c | 34 +++++++++++++++++++++++++++++++--- > include/block/blockjob_int.h | 10 ++++++++++ > 2 files changed, 41 insertions(+), 3 deletions(-) > > diff --git a/blockjob.c b/blockjob.c > index 8f02c03880..1c010ec100 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -394,6 +394,18 @@ static void block_job_update_rc(BlockJob *job) > } > } > > +static int block_job_prepare(BlockJob *job) > +{ > + if (job->ret) { > + goto out; > + } > + if (job->driver->prepare) { > + job->ret = job->driver->prepare(job); > + } > + out: > + return job->ret; > +} Why not just if (job->ret == 0 && job->driver->prepare) and save the goto? Kevin
On 02/28/2018 12:04 PM, Kevin Wolf wrote: > Am 24.02.2018 um 00:51 hat John Snow geschrieben: >> Some jobs upon finalization may need to perform some work that can >> still fail. If these jobs are part of a transaction, it's important >> that these callbacks fail the entire transaction. >> >> We allow for a new callback in addition to commit/abort/clean that >> allows us the opportunity to have fairly late-breaking failures >> in the transactional process. >> >> The expected flow is: >> >> - All jobs in a transaction converge to the WAITING state >> (added in a forthcoming commit) >> - All jobs prepare to call either commit/abort >> - If any job fails, is canceled, or fails preparation, all jobs >> call their .abort callback. >> - All jobs enter the PENDING state, awaiting manual intervention >> (also added in a forthcoming commit) >> - block-job-finalize is issued by the user/management layer >> - All jobs call their commit callbacks. >> >> Signed-off-by: John Snow <jsnow@redhat.com> > > You almost made me believe the scary thought that we need transactional > graph modifications, but after writing half of the reply, I think it's > just that your order here is wrong. > Sorry, yes, this blurb was outdated. I regret that it wasted your time. > So .prepare is the last thing in the whole process that is allowed to > fail. Graph manipulations such as bdrv_replace_node() can fail. Graph > manipulations can also only be made in response to block-job-finalize > because the management layer must be aware of them. Take them together > and you have a problem. > > Didn't we already establish earlier that .prepare/.commit/.abort must be > called together and cannot be separated by waiting for a QMP command > because of locking and things? > Right; so what really happens is that in response to the FINALIZE verb, the prepare loop is done first to check for success, and then commit or abort are dispatched as appropriate. > So if you go to PENDING first, then wait for block-job-finalize and only > then call .prepare/.commit/.abort, we should be okay for both problems. > > And taking a look at the final state, that seems to be what you do, so > in the end, it's probably just the commit message that needs a fix. Yep, sorry. > >> blockjob.c | 34 +++++++++++++++++++++++++++++++--- >> include/block/blockjob_int.h | 10 ++++++++++ >> 2 files changed, 41 insertions(+), 3 deletions(-) >> >> diff --git a/blockjob.c b/blockjob.c >> index 8f02c03880..1c010ec100 100644 >> --- a/blockjob.c >> +++ b/blockjob.c >> @@ -394,6 +394,18 @@ static void block_job_update_rc(BlockJob *job) >> } >> } >> >> +static int block_job_prepare(BlockJob *job) >> +{ >> + if (job->ret) { >> + goto out; >> + } >> + if (job->driver->prepare) { >> + job->ret = job->driver->prepare(job); >> + } >> + out: >> + return job->ret; >> +} > > Why not just if (job->ret == 0 && job->driver->prepare) and save the > goto? > Churn. ¯\_(ツ)_/¯ > Kevin >
© 2016 - 2025 Red Hat, Inc.