From nobody Mon Feb 9 23:18:12 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1585729091; cv=none; d=zohomail.com; s=zohoarc; b=QPbVakUGM7MkafbXThfYWWQ2nU0nNWujwfaJHg8XGrBDdjYrY9AzxXHB1u2vO87ISFMsAI4nLEHP3m5J/poYk+ZseArJX2sCBkRCxwYOSLjRHa8BLE+BKggzjzE5JlqqTeaJ5A/gQL9TWwvD6RK2FK2iQhfGceHOj/DguRzCM8c= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1585729091; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=LqZRd8QOMZ5kRQ8s5de2efHMtTbvL3fmTQtTQORaqe0=; b=cEUHt0BgdMRnc9gu/Cs8/FVidbw6Xa2eRzUiPKOP6AN3ZMTTjJ6IUKZD8lZS645aU51a2oKbXGBj67S5H2vRddY9RYf5arWQMn0xAkQjNkRyI11LcTZbTIHYArN3Xx2t2baMaFm4iTIouh8aHdKVLvMg1UfBWYtWiNkAPOAJ9A0= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1585729091495680.4586589013826; Wed, 1 Apr 2020 01:18:11 -0700 (PDT) Received: from localhost ([::1]:48828 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJYZd-0005m9-Pn for importer@patchew.org; Wed, 01 Apr 2020 04:18:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35296) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJYWt-0001sc-5F for qemu-devel@nongnu.org; Wed, 01 Apr 2020 04:15:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jJYWr-00073C-Kn for qemu-devel@nongnu.org; Wed, 01 Apr 2020 04:15:19 -0400 Received: from proxmox-new.maurer-it.com ([212.186.127.180]:14740) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jJYWo-0006fM-MW; Wed, 01 Apr 2020 04:15:14 -0400 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9E47F4592C; Wed, 1 Apr 2020 10:15:10 +0200 (CEST) From: Stefan Reiter To: qemu-devel@nongnu.org, qemu-block@nongnu.org Subject: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply Date: Wed, 1 Apr 2020 10:15:02 +0200 Message-Id: <20200401081504.200017-2-s.reiter@proxmox.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200401081504.200017-1-s.reiter@proxmox.com> References: <20200401081504.200017-1-s.reiter@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 212.186.127.180 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, slp@redhat.com, mreitz@redhat.com, stefanha@redhat.com, jsnow@redhat.com, dietmar@proxmox.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" All callers of job_txn_apply hold a single job's lock, but different jobs within a transaction can have different contexts, thus we need to lock each one individually before applying the callback function. Similar to job_completed_txn_abort this also requires releasing the caller's context before and reacquiring it after to avoid recursive locks which might break AIO_WAIT_WHILE in the callback. This also brings to light a different issue: When a callback function in job_txn_apply moves it's job to a different AIO context, job_exit will try to release the wrong lock (now that we re-acquire the lock correctly, previously it would just continue with the old lock, leaving the job unlocked for the rest of the codepath back to job_exit). Fix this by not caching the job's context in job_exit and add a comment about why this is done. One test needed adapting, since it calls job_finalize directly, so it manually needs to acquire the correct context. Signed-off-by: Stefan Reiter --- job.c | 48 ++++++++++++++++++++++++++++++++++--------- tests/test-blockjob.c | 2 ++ 2 files changed, 40 insertions(+), 10 deletions(-) diff --git a/job.c b/job.c index 134a07b92e..5fbaaabf78 100644 --- a/job.c +++ b/job.c @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job) } } =20 -static int job_txn_apply(JobTxn *txn, int fn(Job *)) +static int job_txn_apply(Job *job, int fn(Job *)) { - Job *job, *next; + AioContext *inner_ctx; + Job *other_job, *next; + JobTxn *txn =3D job->txn; int rc =3D 0; =20 - QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) { - rc =3D fn(job); + /* + * Similar to job_completed_txn_abort, we take each job's lock before + * applying fn, but since we assume that outer_ctx is held by the call= er, + * we need to release it here to avoid holding the lock twice - which = would + * break AIO_WAIT_WHILE from within fn. + */ + aio_context_release(job->aio_context); + + QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) { + inner_ctx =3D other_job->aio_context; + aio_context_acquire(inner_ctx); + rc =3D fn(other_job); + aio_context_release(inner_ctx); if (rc) { break; } } + + /* + * Note that job->aio_context might have been changed by calling fn, s= o we + * can't use a local variable to cache it. + */ + aio_context_acquire(job->aio_context); return rc; } =20 @@ -774,11 +793,11 @@ static void job_do_finalize(Job *job) assert(job && job->txn); =20 /* prepare the transaction to complete */ - rc =3D job_txn_apply(job->txn, job_prepare); + rc =3D job_txn_apply(job, job_prepare); if (rc) { job_completed_txn_abort(job); } else { - job_txn_apply(job->txn, job_finalize_single); + job_txn_apply(job, job_finalize_single); } } =20 @@ -824,10 +843,10 @@ static void job_completed_txn_success(Job *job) assert(other_job->ret =3D=3D 0); } =20 - job_txn_apply(txn, job_transition_to_pending); + job_txn_apply(job, job_transition_to_pending); =20 /* If no jobs need manual finalization, automatically do so */ - if (job_txn_apply(txn, job_needs_finalize) =3D=3D 0) { + if (job_txn_apply(job, job_needs_finalize) =3D=3D 0) { job_do_finalize(job); } } @@ -849,9 +868,10 @@ static void job_completed(Job *job) static void job_exit(void *opaque) { Job *job =3D (Job *)opaque; - AioContext *ctx =3D job->aio_context; + AioContext *ctx; =20 - aio_context_acquire(ctx); + job_ref(job); + aio_context_acquire(job->aio_context); =20 /* This is a lie, we're not quiescent, but still doing the completion * callbacks. However, completion callbacks tend to involve operations= that @@ -862,6 +882,14 @@ static void job_exit(void *opaque) =20 job_completed(job); =20 + /* + * Note that calling job_completed can move the job to a different + * aio_context, so we cannot cache from above. job_txn_apply takes car= e of + * acquiring the new lock, and we ref/unref to avoid job_completed fre= eing + * the job underneath us. + */ + ctx =3D job->aio_context; + job_unref(job); aio_context_release(ctx); } =20 diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c index 4eeb184caf..7519847912 100644 --- a/tests/test-blockjob.c +++ b/tests/test-blockjob.c @@ -367,7 +367,9 @@ static void test_cancel_concluded(void) aio_poll(qemu_get_aio_context(), true); assert(job->status =3D=3D JOB_STATUS_PENDING); =20 + aio_context_acquire(job->aio_context); job_finalize(job, &error_abort); + aio_context_release(job->aio_context); assert(job->status =3D=3D JOB_STATUS_CONCLUDED); =20 cancel_common(s); --=20 2.26.0 From nobody Mon Feb 9 23:18:12 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1585728979; cv=none; d=zohomail.com; s=zohoarc; b=a75oP03CmdJK89b+9CWq+JpeNCrtm6czfD/qd23YvyRQX39ObT8pR3FjzKPbEzu3mAau+dCA/LWPstxjKHY23qCTB4LEL6HdLPoLQqpIAFzSX9pnsP3Y6p8vnK8oT/2QT2N3rxgd/6tzXxQt43/Lwff0pj50UBnt/rhJzmomgwA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1585728979; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=VpBoBFWjS0FxxycK3wVnErjaLOO5YW7PA/juX3okT20=; b=J1HONgKi0eNYqCZU4uMU4wgHmiXEOfIYkqfgydREHkyTqjZIvziteefVs1ZtLHtRbLpkvUMqix/3HIeE3kg/qPoVOcvoV1GKRjEXe9cGJjswNSQTunPrs//J+QwgM7b46LnKM84FpcjPIlXuSdC9teNyz88UKFfT2u+0jeyPYmA= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1585728979732792.0583271741865; Wed, 1 Apr 2020 01:16:19 -0700 (PDT) Received: from localhost ([::1]:48810 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJYXp-0003Gx-Ub for importer@patchew.org; Wed, 01 Apr 2020 04:16:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35272) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJYWs-0001rT-0w for qemu-devel@nongnu.org; Wed, 01 Apr 2020 04:15:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jJYWr-00070d-2L for qemu-devel@nongnu.org; Wed, 01 Apr 2020 04:15:17 -0400 Received: from proxmox-new.maurer-it.com ([212.186.127.180]:34521) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jJYWo-0006hw-L2; Wed, 01 Apr 2020 04:15:14 -0400 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A66F845930; Wed, 1 Apr 2020 10:15:10 +0200 (CEST) From: Stefan Reiter To: qemu-devel@nongnu.org, qemu-block@nongnu.org Subject: [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync Date: Wed, 1 Apr 2020 10:15:03 +0200 Message-Id: <20200401081504.200017-3-s.reiter@proxmox.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200401081504.200017-1-s.reiter@proxmox.com> References: <20200401081504.200017-1-s.reiter@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 212.186.127.180 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, slp@redhat.com, mreitz@redhat.com, stefanha@redhat.com, jsnow@redhat.com, dietmar@proxmox.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" job_cancel_sync requires the job's lock to be held, all other callers already do this (replication_stop, drive_backup_abort, blockdev_backup_abort, job_cancel_sync_all, cancel_common). Signed-off-by: Stefan Reiter --- block/replication.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/replication.c b/block/replication.c index 413d95407d..17ddc31569 100644 --- a/block/replication.c +++ b/block/replication.c @@ -144,12 +144,18 @@ fail: static void replication_close(BlockDriverState *bs) { BDRVReplicationState *s =3D bs->opaque; + Job *commit_job; + AioContext *commit_ctx; =20 if (s->stage =3D=3D BLOCK_REPLICATION_RUNNING) { replication_stop(s->rs, false, NULL); } if (s->stage =3D=3D BLOCK_REPLICATION_FAILOVER) { - job_cancel_sync(&s->commit_job->job); + commit_job =3D &s->commit_job->job; + commit_ctx =3D commit_job->aio_context; + aio_context_acquire(commit_ctx); + job_cancel_sync(commit_job); + aio_context_release(commit_ctx); } =20 if (s->mode =3D=3D REPLICATION_MODE_SECONDARY) { --=20 2.26.0 From nobody Mon Feb 9 23:18:12 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1585728988; cv=none; d=zohomail.com; s=zohoarc; b=U5s6igMogeeDBVlrUEOurijk+dqiLhA28SzAqNNm4iGqmapQuLDTcQqjZc9QC5778TVA4sgF+6ot4in8EWNMbW1aXJtfU/VWq4hOdzcD6UpyaJSPp1fOBsv3rNDdiAoGriBIzH/WIa1Sc9Q2t0YFIQHdvS8o5uPRAQvwvjLb/OU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1585728988; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=LA4ytw7qe1vydecYf/Pl8Nxk9PDet9/Q1eSCdRaLOfM=; b=NvZgMQVQbE2x7IyG53+Pnq3ng9QS2PruydLhbqzChX3HMu1sNA9OuNb/245oOYuXLWe4Z8mLTizUyrpifx7SoUzMfCEpfMamIDtPnFQjstfta+/QAQLmtbeBINRwDoJFRPxPDTVwvzAaT1rIc0NDG6si9Cs6PpWpvMnkqey+Byk= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1585728988979777.7213358878773; Wed, 1 Apr 2020 01:16:28 -0700 (PDT) Received: from localhost ([::1]:48812 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJYXx-0003U0-Lh for importer@patchew.org; Wed, 01 Apr 2020 04:16:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35281) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJYWs-0001rj-Fv for qemu-devel@nongnu.org; Wed, 01 Apr 2020 04:15:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jJYWr-000723-BO for qemu-devel@nongnu.org; Wed, 01 Apr 2020 04:15:18 -0400 Received: from proxmox-new.maurer-it.com ([212.186.127.180]:16169) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jJYWo-0006f0-PN; Wed, 01 Apr 2020 04:15:14 -0400 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 017DD457F0; Wed, 1 Apr 2020 10:15:11 +0200 (CEST) From: Stefan Reiter To: qemu-devel@nongnu.org, qemu-block@nongnu.org Subject: [PATCH v4 3/3] backup: don't acquire aio_context in backup_clean Date: Wed, 1 Apr 2020 10:15:04 +0200 Message-Id: <20200401081504.200017-4-s.reiter@proxmox.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200401081504.200017-1-s.reiter@proxmox.com> References: <20200401081504.200017-1-s.reiter@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 212.186.127.180 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, slp@redhat.com, mreitz@redhat.com, stefanha@redhat.com, jsnow@redhat.com, dietmar@proxmox.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" All code-paths leading to backup_clean (via job_clean) have the job's context already acquired. The job's context is guaranteed to be the same as the one used by backup_top via backup_job_create. Since the previous logic effectively acquired the lock twice, this broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WH= ILE in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock once, thus deadlocking with the IO thread. This is a partial revert of 0abf2581717a19. Signed-off-by: Stefan Reiter Reviewed-by: Max Reitz --- With the two previous patches applied, the commit message should now hold t= rue. block/backup.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/block/backup.c b/block/backup.c index 7430ca5883..a7a7dcaf4c 100644 --- a/block/backup.c +++ b/block/backup.c @@ -126,11 +126,7 @@ static void backup_abort(Job *job) static void backup_clean(Job *job) { BackupBlockJob *s =3D container_of(job, BackupBlockJob, common.job); - AioContext *aio_context =3D bdrv_get_aio_context(s->backup_top); - - aio_context_acquire(aio_context); bdrv_backup_top_drop(s->backup_top); - aio_context_release(aio_context); } =20 void backup_do_checkpoint(BlockJob *job, Error **errp) --=20 2.26.0