Finalizing the job may cause its AioContext to change. This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.
However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort(). The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end. If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").
Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
job.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/job.c b/job.c
index e7a5d28854..3fe23bb77e 100644
--- a/job.c
+++ b/job.c
@@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
static void job_completed_txn_abort(Job *job)
{
- AioContext *outer_ctx = job->aio_context;
AioContext *ctx;
JobTxn *txn = job->txn;
Job *other_job;
@@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
txn->aborting = true;
job_txn_ref(txn);
- /* We can only hold the single job's AioContext lock while calling
+ /*
+ * We can only hold the single job's AioContext lock while calling
* job_finalize_single() because the finalization callbacks can involve
- * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
- aio_context_release(outer_ctx);
+ * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+ * Note that the job's AioContext may change when it is finalized.
+ */
+ job_ref(job);
+ aio_context_release(job->aio_context);
/* Other jobs are effectively cancelled by us, set the status for
* them; this job, however, may or may not be cancelled, depending
@@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
}
while (!QLIST_EMPTY(&txn->jobs)) {
other_job = QLIST_FIRST(&txn->jobs);
+ /*
+ * The job's AioContext may change, so store it in @ctx so we
+ * release the same context that we have acquired before.
+ */
ctx = other_job->aio_context;
aio_context_acquire(ctx);
if (!job_is_completed(other_job)) {
@@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
aio_context_release(ctx);
}
- aio_context_acquire(outer_ctx);
+ /*
+ * Use job_ref()/job_unref() so we can read the AioContext here
+ * even if the job went away during job_finalize_single().
+ */
+ ctx = job->aio_context;
+ job_unref(job);
+ aio_context_acquire(ctx);
job_txn_unref(txn);
}
--
2.31.1
On Fri, Aug 06, 2021 at 11:38:48AM +0200, Max Reitz wrote:
> Finalizing the job may cause its AioContext to change. This is noted by
> job_exit(), which points at job_txn_apply() to take this fact into
> account.
>
> However, job_completed() does not necessarily invoke job_txn_apply()
> (through job_completed_txn_success()), but potentially also
> job_completed_txn_abort(). The latter stores the context in a local
> variable, and so always acquires the same context at its end that it has
> released in the beginning -- which may be a different context from the
> one that job_exit() releases at its end. If it is different, qemu
> aborts ("qemu_mutex_unlock_impl: Operation not permitted").
Is this a bug fix that needs to make it into 6.1?
>
> Drop the local @outer_ctx variable from job_completed_txn_abort(), and
> instead re-acquire the actual job's context at the end of the function,
> so job_exit() will release the same.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> job.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
The commit message makes sense, and does a good job at explaining the
change. I'm still a bit fuzzy on how jobs are supposed to play nice
with contexts, but since your patch matches the commit message, I'm
happy to give:
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
On 06.08.21 21:16, Eric Blake wrote:
> On Fri, Aug 06, 2021 at 11:38:48AM +0200, Max Reitz wrote:
>> Finalizing the job may cause its AioContext to change. This is noted by
>> job_exit(), which points at job_txn_apply() to take this fact into
>> account.
>>
>> However, job_completed() does not necessarily invoke job_txn_apply()
>> (through job_completed_txn_success()), but potentially also
>> job_completed_txn_abort(). The latter stores the context in a local
>> variable, and so always acquires the same context at its end that it has
>> released in the beginning -- which may be a different context from the
>> one that job_exit() releases at its end. If it is different, qemu
>> aborts ("qemu_mutex_unlock_impl: Operation not permitted").
> Is this a bug fix that needs to make it into 6.1?
Well, I only encountered it as part of this series (which I really don’t
think is 6.2 material at this point), and so I don’t know.
Can’t hurt, I suppose, but if we wanted this to be in 6.1, we’d better
have a specific test for it, I think.
>> Drop the local @outer_ctx variable from job_completed_txn_abort(), and
>> instead re-acquire the actual job's context at the end of the function,
>> so job_exit() will release the same.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> job.c | 23 ++++++++++++++++++-----
>> 1 file changed, 18 insertions(+), 5 deletions(-)
> The commit message makes sense, and does a good job at explaining the
> change. I'm still a bit fuzzy on how jobs are supposed to play nice
> with contexts,
I can relate :)
> but since your patch matches the commit message, I'm
> happy to give:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
Thanks!
06.08.2021 12:38, Max Reitz wrote:
> Finalizing the job may cause its AioContext to change. This is noted by
> job_exit(), which points at job_txn_apply() to take this fact into
> account.
>
> However, job_completed() does not necessarily invoke job_txn_apply()
> (through job_completed_txn_success()), but potentially also
> job_completed_txn_abort(). The latter stores the context in a local
> variable, and so always acquires the same context at its end that it has
> released in the beginning -- which may be a different context from the
> one that job_exit() releases at its end. If it is different, qemu
> aborts ("qemu_mutex_unlock_impl: Operation not permitted").
>
> Drop the local @outer_ctx variable from job_completed_txn_abort(), and
> instead re-acquire the actual job's context at the end of the function,
> so job_exit() will release the same.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> job.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/job.c b/job.c
> index e7a5d28854..3fe23bb77e 100644
> --- a/job.c
> +++ b/job.c
> @@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
>
> static void job_completed_txn_abort(Job *job)
> {
> - AioContext *outer_ctx = job->aio_context;
> AioContext *ctx;
> JobTxn *txn = job->txn;
> Job *other_job;
> @@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
> txn->aborting = true;
> job_txn_ref(txn);
>
> - /* We can only hold the single job's AioContext lock while calling
> + /*
> + * We can only hold the single job's AioContext lock while calling
> * job_finalize_single() because the finalization callbacks can involve
> - * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
> - aio_context_release(outer_ctx);
> + * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
> + * Note that the job's AioContext may change when it is finalized.
> + */
> + job_ref(job);
> + aio_context_release(job->aio_context);
>
> /* Other jobs are effectively cancelled by us, set the status for
> * them; this job, however, may or may not be cancelled, depending
> @@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
> }
> while (!QLIST_EMPTY(&txn->jobs)) {
> other_job = QLIST_FIRST(&txn->jobs);
> + /*
> + * The job's AioContext may change, so store it in @ctx so we
> + * release the same context that we have acquired before.
> + */
> ctx = other_job->aio_context;
> aio_context_acquire(ctx);
> if (!job_is_completed(other_job)) {
> @@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
> aio_context_release(ctx);
> }
>
> - aio_context_acquire(outer_ctx);
> + /*
> + * Use job_ref()/job_unref() so we can read the AioContext here
> + * even if the job went away during job_finalize_single().
> + */
> + ctx = job->aio_context;
> + job_unref(job);
> + aio_context_acquire(ctx);
why to use ctx variable and not do it exactly same as in job_txn_apply() :
aio_context_acquire(job->aio_context);
job_unref(job);
?
anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
--
Best regards,
Vladimir
On 01.09.21 12:05, Vladimir Sementsov-Ogievskiy wrote:
> 06.08.2021 12:38, Max Reitz wrote:
>> Finalizing the job may cause its AioContext to change. This is noted by
>> job_exit(), which points at job_txn_apply() to take this fact into
>> account.
>>
>> However, job_completed() does not necessarily invoke job_txn_apply()
>> (through job_completed_txn_success()), but potentially also
>> job_completed_txn_abort(). The latter stores the context in a local
>> variable, and so always acquires the same context at its end that it has
>> released in the beginning -- which may be a different context from the
>> one that job_exit() releases at its end. If it is different, qemu
>> aborts ("qemu_mutex_unlock_impl: Operation not permitted").
>>
>> Drop the local @outer_ctx variable from job_completed_txn_abort(), and
>> instead re-acquire the actual job's context at the end of the function,
>> so job_exit() will release the same.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> job.c | 23 ++++++++++++++++++-----
>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/job.c b/job.c
>> index e7a5d28854..3fe23bb77e 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
>> static void job_completed_txn_abort(Job *job)
>> {
>> - AioContext *outer_ctx = job->aio_context;
>> AioContext *ctx;
>> JobTxn *txn = job->txn;
>> Job *other_job;
>> @@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
>> txn->aborting = true;
>> job_txn_ref(txn);
>> - /* We can only hold the single job's AioContext lock while
>> calling
>> + /*
>> + * We can only hold the single job's AioContext lock while calling
>> * job_finalize_single() because the finalization callbacks can
>> involve
>> - * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
>> - aio_context_release(outer_ctx);
>> + * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
>> + * Note that the job's AioContext may change when it is finalized.
>> + */
>> + job_ref(job);
>> + aio_context_release(job->aio_context);
>> /* Other jobs are effectively cancelled by us, set the status
>> for
>> * them; this job, however, may or may not be cancelled, depending
>> @@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
>> }
>> while (!QLIST_EMPTY(&txn->jobs)) {
>> other_job = QLIST_FIRST(&txn->jobs);
>> + /*
>> + * The job's AioContext may change, so store it in @ctx so we
>> + * release the same context that we have acquired before.
>> + */
>> ctx = other_job->aio_context;
>> aio_context_acquire(ctx);
>> if (!job_is_completed(other_job)) {
>> @@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
>> aio_context_release(ctx);
>> }
>> - aio_context_acquire(outer_ctx);
>> + /*
>> + * Use job_ref()/job_unref() so we can read the AioContext here
>> + * even if the job went away during job_finalize_single().
>> + */
>> + ctx = job->aio_context;
>> + job_unref(job);
>> + aio_context_acquire(ctx);
>
>
> why to use ctx variable and not do it exactly same as in
> job_txn_apply() :
>
> aio_context_acquire(job->aio_context);
> job_unref(job);
>
> ?
Oh, I just didn’t think of that. Sounds good, thanks!
Hanna
> anyway:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
© 2016 - 2026 Red Hat, Inc.