From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843298852937.6473494386915; Thu, 13 Sep 2018 05:54:58 -0700 (PDT) Received: from localhost ([::1]:42269 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R8w-0003Je-SB for importer@patchew.org; Thu, 13 Sep 2018 08:54:46 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46520) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R6u-00021h-Q5 for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R6t-0001aW-Se for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39028) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R6p-0001WY-3h; Thu, 13 Sep 2018 08:52:35 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 72912804EE; Thu, 13 Sep 2018 12:52:34 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9A0D660BE0; Thu, 13 Sep 2018 12:52:32 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:01 +0200 Message-Id: <20180913125217.23173-2-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 13 Sep 2018 12:52:34 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" job_completed() had a problem with double locking that was recently fixed independently by two different commits: "job: Fix nested aio_poll() hanging in job_txn_apply" "jobs: add exit shim" One fix removed the first aio_context_acquire(), the other fix removed the other one. Now we have a bug again and the code is run without any locking. Add it back in one of the places. Signed-off-by: Kevin Wolf Reviewed-by: John Snow Reviewed-by: Max Reitz --- job.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/job.c b/job.c index 82b46929bd..5c4e84f007 100644 --- a/job.c +++ b/job.c @@ -847,7 +847,11 @@ static void job_completed(Job *job) static void job_exit(void *opaque) { Job *job =3D (Job *)opaque; + AioContext *ctx =3D job->aio_context; + + aio_context_acquire(ctx); job_completed(job); + aio_context_release(ctx); } =20 /** --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843298919676.2733006841235; Thu, 13 Sep 2018 05:54:58 -0700 (PDT) Received: from localhost ([::1]:42270 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R8z-0003MD-7F for importer@patchew.org; Thu, 13 Sep 2018 08:54:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R6v-000227-Df for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R6u-0001b5-Bi for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42453) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R6r-0001Xr-7m; Thu, 13 Sep 2018 08:52:37 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 92A56308FBA6; Thu, 13 Sep 2018 12:52:36 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id BE8B660BE0; Thu, 13 Sep 2018 12:52:34 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:02 +0200 Message-Id: <20180913125217.23173-3-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Thu, 13 Sep 2018 12:52:36 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In the context of draining a BDS, the .drained_poll callback of block jobs is called. If this returns true (i.e. there is still some activity pending), the drain operation may call aio_poll() with blocking=3Dtrue to wait for completion. As soon as the pending activity is completed and the job finally arrives in a quiescent state (i.e. its coroutine either yields with busy=3Dfalse or terminates), the block job must notify the aio_poll() loop to wake up, otherwise we get a deadlock if both are running in different threads. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng Reviewed-by: Max Reitz --- include/block/blockjob.h | 13 +++++++++++++ include/qemu/job.h | 3 +++ blockjob.c | 18 ++++++++++++++++++ job.c | 7 +++++++ 4 files changed, 41 insertions(+) diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 32c00b7dc0..2290bbb824 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -70,6 +70,9 @@ typedef struct BlockJob { /** Called when the job transitions to READY */ Notifier ready_notifier; =20 + /** Called when the job coroutine yields or terminates */ + Notifier idle_notifier; + /** BlockDriverStates that are involved in this block job */ GSList *nodes; } BlockJob; @@ -119,6 +122,16 @@ int block_job_add_bdrv(BlockJob *job, const char *name= , BlockDriverState *bs, void block_job_remove_all_bdrv(BlockJob *job); =20 /** + * block_job_wakeup_all_bdrv: + * @job: The block job + * + * Calls bdrv_wakeup() for all BlockDriverStates that have been added to t= he + * job. This function is to be called whenever child_job_drained_poll() wo= uld + * go from true to false to notify waiting drain requests. + */ +void block_job_wakeup_all_bdrv(BlockJob *job); + +/** * block_job_set_speed: * @job: The job to set the speed for. * @speed: The new value diff --git a/include/qemu/job.h b/include/qemu/job.h index 5cb0681834..b4a784d3cc 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -156,6 +156,9 @@ typedef struct Job { /** Notifiers called when the job transitions to READY */ NotifierList on_ready; =20 + /** Notifiers called when the job coroutine yields or terminates */ + NotifierList on_idle; + /** Element of the list of jobs */ QLIST_ENTRY(Job) job_list; =20 diff --git a/blockjob.c b/blockjob.c index be5903aa96..8d27e8e1ea 100644 --- a/blockjob.c +++ b/blockjob.c @@ -221,6 +221,22 @@ int block_job_add_bdrv(BlockJob *job, const char *name= , BlockDriverState *bs, return 0; } =20 +void block_job_wakeup_all_bdrv(BlockJob *job) +{ + GSList *l; + + for (l =3D job->nodes; l; l =3D l->next) { + BdrvChild *c =3D l->data; + bdrv_wakeup(c->bs); + } +} + +static void block_job_on_idle(Notifier *n, void *opaque) +{ + BlockJob *job =3D opaque; + block_job_wakeup_all_bdrv(job); +} + bool block_job_is_internal(BlockJob *job) { return (job->job.id =3D=3D NULL); @@ -419,6 +435,7 @@ void *block_job_create(const char *job_id, const BlockJ= obDriver *driver, job->finalize_completed_notifier.notify =3D block_job_event_completed; job->pending_notifier.notify =3D block_job_event_pending; job->ready_notifier.notify =3D block_job_event_ready; + job->idle_notifier.notify =3D block_job_on_idle; =20 notifier_list_add(&job->job.on_finalize_cancelled, &job->finalize_cancelled_notifier); @@ -426,6 +443,7 @@ void *block_job_create(const char *job_id, const BlockJ= obDriver *driver, &job->finalize_completed_notifier); notifier_list_add(&job->job.on_pending, &job->pending_notifier); notifier_list_add(&job->job.on_ready, &job->ready_notifier); + notifier_list_add(&job->job.on_idle, &job->idle_notifier); =20 error_setg(&job->blocker, "block device is in use by block job: %s", job_type_str(&job->job)); diff --git a/job.c b/job.c index 5c4e84f007..48a767c102 100644 --- a/job.c +++ b/job.c @@ -402,6 +402,11 @@ static void job_event_ready(Job *job) notifier_list_notify(&job->on_ready, job); } =20 +static void job_event_idle(Job *job) +{ + notifier_list_notify(&job->on_idle, job); +} + void job_enter_cond(Job *job, bool(*fn)(Job *job)) { if (!job_started(job)) { @@ -447,6 +452,7 @@ static void coroutine_fn job_do_yield(Job *job, uint64_= t ns) timer_mod(&job->sleep_timer, ns); } job->busy =3D false; + job_event_idle(job); job_unlock(); qemu_coroutine_yield(); =20 @@ -865,6 +871,7 @@ static void coroutine_fn job_co_entry(void *opaque) assert(job && job->driver && job->driver->run); job_pause_point(job); job->ret =3D job->driver->run(job, &job->err); + job_event_idle(job); job->deferred_to_main_loop =3D true; aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job); } --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843489513296.6842231199207; Thu, 13 Sep 2018 05:58:09 -0700 (PDT) Received: from localhost ([::1]:42293 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RCC-00069x-FA for importer@patchew.org; Thu, 13 Sep 2018 08:58:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46615) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R73-00027p-Gj for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R6y-0001eA-Cm for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36490) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R6t-0001Zj-D5; Thu, 13 Sep 2018 08:52:39 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BA68F3082E64; Thu, 13 Sep 2018 12:52:38 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id DFA4460BE0; Thu, 13 Sep 2018 12:52:36 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:03 +0200 Message-Id: <20180913125217.23173-4-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Thu, 13 Sep 2018 12:52:38 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Even if AIO_WAIT_WHILE() is called in the home context of the AioContext, we still want to allow the condition to change depending on other threads as long as they kick the AioWait. Specfically block jobs can be running in an I/O thread and should then be able to kick a drain in the main loop context. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng --- include/block/aio-wait.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h index c85a62f798..46ba7f9111 100644 --- a/include/block/aio-wait.h +++ b/include/block/aio-wait.h @@ -77,10 +77,12 @@ typedef struct { AioWait *wait_ =3D (wait); \ AioContext *ctx_ =3D (ctx); \ if (ctx_ && in_aio_context_home_thread(ctx_)) { \ + atomic_inc(&wait_->num_waiters); \ while ((cond)) { \ aio_poll(ctx_, true); \ waited_ =3D true; \ } \ + atomic_dec(&wait_->num_waiters); \ } else { \ assert(qemu_get_current_aio_context() =3D=3D \ qemu_get_aio_context()); \ --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 153684348062751.240423993043464; Thu, 13 Sep 2018 05:58:00 -0700 (PDT) Received: from localhost ([::1]:42290 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RC3-00061o-GU for importer@patchew.org; Thu, 13 Sep 2018 08:57:59 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46617) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R73-00027q-HF for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R6y-0001e6-CD for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36310) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R6v-0001bc-K9; Thu, 13 Sep 2018 08:52:41 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E02FE307D84F; Thu, 13 Sep 2018 12:52:40 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1321D60BE0; Thu, 13 Sep 2018 12:52:38 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:04 +0200 Message-Id: <20180913125217.23173-5-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Thu, 13 Sep 2018 12:52:40 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 04/17] test-bdrv-drain: Drain with block jobs in an I/O thread X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This extends the existing drain test with a block job to include variants where the block job runs in a different AioContext. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng --- tests/test-bdrv-drain.c | 92 +++++++++++++++++++++++++++++++++++++++++++++= ---- 1 file changed, 86 insertions(+), 6 deletions(-) diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index 89ac15e88a..57da22a096 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -174,6 +174,28 @@ static void do_drain_end(enum drain_type drain_type, B= lockDriverState *bs) } } =20 +static void do_drain_begin_unlocked(enum drain_type drain_type, BlockDrive= rState *bs) +{ + if (drain_type !=3D BDRV_DRAIN_ALL) { + aio_context_acquire(bdrv_get_aio_context(bs)); + } + do_drain_begin(drain_type, bs); + if (drain_type !=3D BDRV_DRAIN_ALL) { + aio_context_release(bdrv_get_aio_context(bs)); + } +} + +static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverS= tate *bs) +{ + if (drain_type !=3D BDRV_DRAIN_ALL) { + aio_context_acquire(bdrv_get_aio_context(bs)); + } + do_drain_end(drain_type, bs); + if (drain_type !=3D BDRV_DRAIN_ALL) { + aio_context_release(bdrv_get_aio_context(bs)); + } +} + static void test_drv_cb_common(enum drain_type drain_type, bool recursive) { BlockBackend *blk; @@ -785,11 +807,13 @@ BlockJobDriver test_job_driver =3D { }, }; =20 -static void test_blockjob_common(enum drain_type drain_type) +static void test_blockjob_common(enum drain_type drain_type, bool use_ioth= read) { BlockBackend *blk_src, *blk_target; BlockDriverState *src, *target; BlockJob *job; + IOThread *iothread =3D NULL; + AioContext *ctx; int ret; =20 src =3D bdrv_new_open_driver(&bdrv_test, "source", BDRV_O_RDWR, @@ -797,21 +821,31 @@ static void test_blockjob_common(enum drain_type drai= n_type) blk_src =3D blk_new(BLK_PERM_ALL, BLK_PERM_ALL); blk_insert_bs(blk_src, src, &error_abort); =20 + if (use_iothread) { + iothread =3D iothread_new(); + ctx =3D iothread_get_aio_context(iothread); + blk_set_aio_context(blk_src, ctx); + } else { + ctx =3D qemu_get_aio_context(); + } + target =3D bdrv_new_open_driver(&bdrv_test, "target", BDRV_O_RDWR, &error_abort); blk_target =3D blk_new(BLK_PERM_ALL, BLK_PERM_ALL); blk_insert_bs(blk_target, target, &error_abort); =20 + aio_context_acquire(ctx); job =3D block_job_create("job0", &test_job_driver, NULL, src, 0, BLK_P= ERM_ALL, 0, 0, NULL, NULL, &error_abort); block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abor= t); job_start(&job->job); + aio_context_release(ctx); =20 g_assert_cmpint(job->job.pause_count, =3D=3D, 0); g_assert_false(job->job.paused); g_assert_true(job->job.busy); /* We're in job_sleep_ns() */ =20 - do_drain_begin(drain_type, src); + do_drain_begin_unlocked(drain_type, src); =20 if (drain_type =3D=3D BDRV_DRAIN_ALL) { /* bdrv_drain_all() drains both src and target */ @@ -822,7 +856,14 @@ static void test_blockjob_common(enum drain_type drain= _type) g_assert_true(job->job.paused); g_assert_false(job->job.busy); /* The job is paused */ =20 - do_drain_end(drain_type, src); + do_drain_end_unlocked(drain_type, src); + + if (use_iothread) { + /* paused is reset in the I/O thread, wait for it */ + while (job->job.paused) { + aio_poll(qemu_get_aio_context(), false); + } + } =20 g_assert_cmpint(job->job.pause_count, =3D=3D, 0); g_assert_false(job->job.paused); @@ -841,32 +882,64 @@ static void test_blockjob_common(enum drain_type drai= n_type) =20 do_drain_end(drain_type, target); =20 + if (use_iothread) { + /* paused is reset in the I/O thread, wait for it */ + while (job->job.paused) { + aio_poll(qemu_get_aio_context(), false); + } + } + g_assert_cmpint(job->job.pause_count, =3D=3D, 0); g_assert_false(job->job.paused); g_assert_true(job->job.busy); /* We're in job_sleep_ns() */ =20 + aio_context_acquire(ctx); ret =3D job_complete_sync(&job->job, &error_abort); g_assert_cmpint(ret, =3D=3D, 0); =20 + if (use_iothread) { + blk_set_aio_context(blk_src, qemu_get_aio_context()); + } + aio_context_release(ctx); + blk_unref(blk_src); blk_unref(blk_target); bdrv_unref(src); bdrv_unref(target); + + if (iothread) { + iothread_join(iothread); + } } =20 static void test_blockjob_drain_all(void) { - test_blockjob_common(BDRV_DRAIN_ALL); + test_blockjob_common(BDRV_DRAIN_ALL, false); } =20 static void test_blockjob_drain(void) { - test_blockjob_common(BDRV_DRAIN); + test_blockjob_common(BDRV_DRAIN, false); } =20 static void test_blockjob_drain_subtree(void) { - test_blockjob_common(BDRV_SUBTREE_DRAIN); + test_blockjob_common(BDRV_SUBTREE_DRAIN, false); +} + +static void test_blockjob_iothread_drain_all(void) +{ + test_blockjob_common(BDRV_DRAIN_ALL, true); +} + +static void test_blockjob_iothread_drain(void) +{ + test_blockjob_common(BDRV_DRAIN, true); +} + +static void test_blockjob_iothread_drain_subtree(void) +{ + test_blockjob_common(BDRV_SUBTREE_DRAIN, true); } =20 =20 @@ -1338,6 +1411,13 @@ int main(int argc, char **argv) g_test_add_func("/bdrv-drain/blockjob/drain_subtree", test_blockjob_drain_subtree); =20 + g_test_add_func("/bdrv-drain/blockjob/iothread/drain_all", + test_blockjob_iothread_drain_all); + g_test_add_func("/bdrv-drain/blockjob/iothread/drain", + test_blockjob_iothread_drain); + g_test_add_func("/bdrv-drain/blockjob/iothread/drain_subtree", + test_blockjob_iothread_drain_subtree); + g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain); g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_a= ll); g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain); --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843699179790.4389655924879; Thu, 13 Sep 2018 06:01:39 -0700 (PDT) Received: from localhost ([::1]:42312 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RFV-00007U-Sy for importer@patchew.org; Thu, 13 Sep 2018 09:01:33 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46685) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R78-0002CT-08 for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R77-0001nX-2z for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44510) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R75-0001hV-0X; Thu, 13 Sep 2018 08:52:51 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 470E0A6E1F; Thu, 13 Sep 2018 12:52:50 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 380F760BE0; Thu, 13 Sep 2018 12:52:41 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:05 +0200 Message-Id: <20180913125217.23173-6-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 13 Sep 2018 12:52:50 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 05/17] test-blockjob: Acquire AioContext around job_cancel_sync() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" All callers in QEMU proper hold the AioContext lock when calling job_finish_sync(). test-blockjob should do the same when it calls the function indirectly through job_cancel_sync(). Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng --- include/qemu/job.h | 6 ++++++ tests/test-blockjob.c | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/include/qemu/job.h b/include/qemu/job.h index b4a784d3cc..63c60ef1ba 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -524,6 +524,8 @@ void job_user_cancel(Job *job, bool force, Error **errp= ); * * Returns the return value from the job if the job actually completed * during the call, or -ECANCELED if it was canceled. + * + * Callers must hold the AioContext lock of job->aio_context. */ int job_cancel_sync(Job *job); =20 @@ -541,6 +543,8 @@ void job_cancel_sync_all(void); * function). * * Returns the return value from the job. + * + * Callers must hold the AioContext lock of job->aio_context. */ int job_complete_sync(Job *job, Error **errp); =20 @@ -566,6 +570,8 @@ void job_dismiss(Job **job, Error **errp); * * Returns 0 if the job is successfully completed, -ECANCELED if the job w= as * cancelled before completing, and -errno in other error cases. + * + * Callers must hold the AioContext lock of job->aio_context. */ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error *= *errp); =20 diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c index de4c1c20aa..652d1e8359 100644 --- a/tests/test-blockjob.c +++ b/tests/test-blockjob.c @@ -223,6 +223,10 @@ static void cancel_common(CancelJob *s) BlockJob *job =3D &s->common; BlockBackend *blk =3D s->blk; JobStatus sts =3D job->job.status; + AioContext *ctx; + + ctx =3D job->job.aio_context; + aio_context_acquire(ctx); =20 job_cancel_sync(&job->job); if (sts !=3D JOB_STATUS_CREATED && sts !=3D JOB_STATUS_CONCLUDED) { @@ -232,6 +236,8 @@ static void cancel_common(CancelJob *s) assert(job->job.status =3D=3D JOB_STATUS_NULL); job_unref(&job->job); destroy_blk(blk); + + aio_context_release(ctx); } =20 static void test_cancel_created(void) --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843687998573.2658197115323; Thu, 13 Sep 2018 06:01:27 -0700 (PDT) Received: from localhost ([::1]:42309 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RFI-0008Ru-9y for importer@patchew.org; Thu, 13 Sep 2018 09:01:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46726) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R79-0002EL-Mm for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R78-0001pu-Ss for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:52:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42642) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R77-0001lO-3R; Thu, 13 Sep 2018 08:52:53 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A0B030820DE; Thu, 13 Sep 2018 12:52:52 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 926B160BE0; Thu, 13 Sep 2018 12:52:50 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:06 +0200 Message-Id: <20180913125217.23173-7-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Thu, 13 Sep 2018 12:52:52 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" job_finish_sync() needs to release the AioContext lock of the job before calling aio_poll(). Otherwise, callbacks called by aio_poll() would possibly take the lock a second time and run into a deadlock with a nested AIO_WAIT_WHILE() call. Also, job_drain() without aio_poll() isn't necessarily enough to make progress on a job, it could depend on bottom halves to be executed. Combine both open-coded while loops into a single AIO_WAIT_WHILE() call that solves both of these problems. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng Reviewed-by: Max Reitz --- job.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/job.c b/job.c index 48a767c102..fa74558ba0 100644 --- a/job.c +++ b/job.c @@ -29,6 +29,7 @@ #include "qemu/job.h" #include "qemu/id.h" #include "qemu/main-loop.h" +#include "block/aio-wait.h" #include "trace-root.h" #include "qapi/qapi-events-job.h" =20 @@ -962,6 +963,7 @@ void job_complete(Job *job, Error **errp) int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error *= *errp) { Error *local_err =3D NULL; + AioWait dummy_wait =3D {}; int ret; =20 job_ref(job); @@ -974,14 +976,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, E= rror **errp), Error **errp) job_unref(job); return -EBUSY; } - /* job_drain calls job_enter, and it should be enough to induce progre= ss - * until the job completes or moves to the main thread. */ - while (!job->deferred_to_main_loop && !job_is_completed(job)) { - job_drain(job); - } - while (!job_is_completed(job)) { - aio_poll(qemu_get_aio_context(), true); - } + + AIO_WAIT_WHILE(&dummy_wait, job->aio_context, + (job_drain(job), !job_is_completed(job))); + ret =3D (job_is_cancelled(job) && job->ret =3D=3D 0) ? -ECANCELED : jo= b->ret; job_unref(job); return ret; --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843910175232.32741040932535; Thu, 13 Sep 2018 06:05:10 -0700 (PDT) Received: from localhost ([::1]:42331 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RIz-0002q5-14 for importer@patchew.org; Thu, 13 Sep 2018 09:05:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46802) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7G-0002MT-4y for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7F-0001uD-Br for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43610) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R79-0001pZ-8f; Thu, 13 Sep 2018 08:52:55 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9425581DE2; Thu, 13 Sep 2018 12:52:54 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id B621360BE0; Thu, 13 Sep 2018 12:52:52 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:07 +0200 Message-Id: <20180913125217.23173-8-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 13 Sep 2018 12:52:54 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 07/17] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This is a regression test for a deadlock that occurred in block job completion callbacks (via job_defer_to_main_loop) because the AioContext lock was taken twice: once in job_finish_sync() and then again in job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng --- tests/test-bdrv-drain.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index 57da22a096..c3c17b9ff7 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -774,6 +774,15 @@ typedef struct TestBlockJob { bool should_complete; } TestBlockJob; =20 +static int test_job_prepare(Job *job) +{ + TestBlockJob *s =3D container_of(job, TestBlockJob, common.job); + + /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */ + blk_flush(s->common.blk); + return 0; +} + static int coroutine_fn test_job_run(Job *job, Error **errp) { TestBlockJob *s =3D container_of(job, TestBlockJob, common.job); @@ -804,6 +813,7 @@ BlockJobDriver test_job_driver =3D { .drain =3D block_job_drain, .run =3D test_job_run, .complete =3D test_job_complete, + .prepare =3D test_job_prepare, }, }; =20 --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843898275201.05488904290917; Thu, 13 Sep 2018 06:04:58 -0700 (PDT) Received: from localhost ([::1]:42328 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RIn-0002i6-2d for importer@patchew.org; Thu, 13 Sep 2018 09:04:57 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7G-0002Mx-FA for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7F-0001uU-Hs for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43600) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7B-0001r4-C9; Thu, 13 Sep 2018 08:52:57 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B9AC83082A51; Thu, 13 Sep 2018 12:52:56 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id DFECA60BE0; Thu, 13 Sep 2018 12:52:54 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:08 +0200 Message-Id: <20180913125217.23173-9-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Thu, 13 Sep 2018 12:52:56 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" bdrv_do_drained_begin/end() assume that they are called with the AioContext lock of bs held. If we call drain functions from a coroutine with the AioContext lock held, we yield and schedule a BH to move out of coroutine context. This means that the lock for the home context of the coroutine is released and must be re-acquired in the bottom half. Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz --- include/qemu/coroutine.h | 5 +++++ block/io.c | 15 +++++++++++++++ util/qemu-coroutine.c | 5 +++++ 3 files changed, 25 insertions(+) diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index 6f8a487041..9801e7f5a4 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine= *co); void coroutine_fn qemu_coroutine_yield(void); =20 /** + * Get the AioContext of the given coroutine + */ +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co); + +/** * Get the currently executing coroutine */ Coroutine *coroutine_fn qemu_coroutine_self(void); diff --git a/block/io.c b/block/io.c index 7100344c7b..914ba78f1a 100644 --- a/block/io.c +++ b/block/io.c @@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque) BlockDriverState *bs =3D data->bs; =20 if (bs) { + AioContext *ctx =3D bdrv_get_aio_context(bs); + AioContext *co_ctx =3D qemu_coroutine_get_aio_context(co); + + /* + * When the coroutine yielded, the lock for its home context was + * released, so we need to re-acquire it here. If it explicitly + * acquired a different context, the lock is still held and we don= 't + * want to lock it a second time (or AIO_WAIT_WHILE() would hang). + */ + if (ctx =3D=3D co_ctx) { + aio_context_acquire(ctx); + } bdrv_dec_in_flight(bs); if (data->begin) { bdrv_do_drained_begin(bs, data->recursive, data->parent, @@ -296,6 +308,9 @@ static void bdrv_co_drain_bh_cb(void *opaque) bdrv_do_drained_end(bs, data->recursive, data->parent, data->ignore_bds_parents); } + if (ctx =3D=3D co_ctx) { + aio_context_release(ctx); + } } else { assert(data->begin); bdrv_drain_all_begin(); diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c index 1ba4191b84..2295928d33 100644 --- a/util/qemu-coroutine.c +++ b/util/qemu-coroutine.c @@ -198,3 +198,8 @@ bool qemu_coroutine_entered(Coroutine *co) { return co->caller; } + +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co) +{ + return co->ctx; +} --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536844103195535.2001134968917; Thu, 13 Sep 2018 06:08:23 -0700 (PDT) Received: from localhost ([::1]:42350 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RM2-0005GI-4d for importer@patchew.org; Thu, 13 Sep 2018 09:08:18 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46808) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7G-0002Mh-6x for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7F-0001uP-F7 for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56496) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7D-0001sX-HP; Thu, 13 Sep 2018 08:52:59 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DBEF788313; Thu, 13 Sep 2018 12:52:58 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E32060BE0; Thu, 13 Sep 2018 12:52:56 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:09 +0200 Message-Id: <20180913125217.23173-10-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 13 Sep 2018 12:52:58 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" A bdrv_drain operation must ensure that all parents are quiesced, this includes BlockBackends. Otherwise, callbacks called by requests that are completed on the BDS layer, but not quite yet on the BlockBackend layer could still create new requests. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng Reviewed-by: Max Reitz --- block/block-backend.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index fa120630be..efa8d8011c 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -121,6 +121,7 @@ static void blk_root_inherit_options(int *child_flags, = QDict *child_options, abort(); } static void blk_root_drained_begin(BdrvChild *child); +static bool blk_root_drained_poll(BdrvChild *child); static void blk_root_drained_end(BdrvChild *child); =20 static void blk_root_change_media(BdrvChild *child, bool load); @@ -294,6 +295,7 @@ static const BdrvChildRole child_root =3D { .get_parent_desc =3D blk_root_get_parent_desc, =20 .drained_begin =3D blk_root_drained_begin, + .drained_poll =3D blk_root_drained_poll, .drained_end =3D blk_root_drained_end, =20 .activate =3D blk_root_activate, @@ -2191,6 +2193,13 @@ static void blk_root_drained_begin(BdrvChild *child) } } =20 +static bool blk_root_drained_poll(BdrvChild *child) +{ + BlockBackend *blk =3D child->opaque; + assert(blk->quiesce_counter); + return !!blk->in_flight; +} + static void blk_root_drained_end(BdrvChild *child) { BlockBackend *blk =3D child->opaque; --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843391603614.1414788385213; Thu, 13 Sep 2018 05:56:31 -0700 (PDT) Received: from localhost ([::1]:42284 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RAc-0004lh-He for importer@patchew.org; Thu, 13 Sep 2018 08:56:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7K-0002PE-F3 for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7J-0001xZ-Jb for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46576) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7F-0001u4-Lr; Thu, 13 Sep 2018 08:53:01 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0BA92308338E; Thu, 13 Sep 2018 12:53:01 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 34B8760BE0; Thu, 13 Sep 2018 12:52:59 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:10 +0200 Message-Id: <20180913125217.23173-11-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Thu, 13 Sep 2018 12:53:01 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" blk_unref() first decreases the refcount of the BlockBackend and calls blk_delete() if the refcount reaches zero. Requests can still be in flight at this point, they are only drained during blk_delete(): At this point, arbitrary callbacks can run. If any callback takes a temporary BlockBackend reference, it will first increase the refcount to 1 and then decrease it to 0 again, triggering another blk_delete(). This will cause a use-after-free crash in the outer blk_delete(). Fix it by draining the BlockBackend before decreasing to refcount to 0. Assert in blk_ref() that it never takes the first refcount (which would mean that the BlockBackend is already being deleted). Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng Reviewed-by: Max Reitz --- block/block-backend.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index efa8d8011c..1b2d7a6ff5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -435,6 +435,7 @@ int blk_get_refcnt(BlockBackend *blk) */ void blk_ref(BlockBackend *blk) { + assert(blk->refcnt > 0); blk->refcnt++; } =20 @@ -447,7 +448,11 @@ void blk_unref(BlockBackend *blk) { if (blk) { assert(blk->refcnt > 0); - if (!--blk->refcnt) { + if (blk->refcnt > 1) { + blk->refcnt--; + } else { + blk_drain(blk); + blk->refcnt =3D 0; blk_delete(blk); } } --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843579925987.8465587722347; Thu, 13 Sep 2018 05:59:39 -0700 (PDT) Received: from localhost ([::1]:42298 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RDe-0007F2-Pm for importer@patchew.org; Thu, 13 Sep 2018 08:59:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46961) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7U-0002Wg-56 for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7R-00026L-BV for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49330) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7P-00023g-Lh; Thu, 13 Sep 2018 08:53:11 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 03949308FC50; Thu, 13 Sep 2018 12:53:11 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5929E60BE0; Thu, 13 Sep 2018 12:53:01 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:11 +0200 Message-Id: <20180913125217.23173-12-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Thu, 13 Sep 2018 12:53:11 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Request callbacks can do pretty much anything, including operations that will yield from the coroutine (such as draining the backend). In that case, a decreased in_flight would be visible to other code and could lead to a drain completing while the callback hasn't actually completed yet. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng Reviewed-by: Max Reitz --- block/block-backend.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index 1b2d7a6ff5..2efaf0c968 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1338,8 +1338,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info =3D { static void blk_aio_complete(BlkAioEmAIOCB *acb) { if (acb->has_returned) { - blk_dec_in_flight(acb->rwco.blk); + if (qemu_get_current_aio_context() =3D=3D qemu_get_aio_context()) { + /* If we are in the main thread, the callback is allowed to un= ref + * the BlockBackend, so we have to hold an additional referenc= e */ + blk_ref(acb->rwco.blk); + } acb->common.cb(acb->common.opaque, acb->rwco.ret); + blk_dec_in_flight(acb->rwco.blk); + if (qemu_get_current_aio_context() =3D=3D qemu_get_aio_context()) { + blk_unref(acb->rwco.blk); + } qemu_aio_unref(acb); } } --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536844292386603.0916922440285; Thu, 13 Sep 2018 06:11:32 -0700 (PDT) Received: from localhost ([::1]:42376 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RP9-0007xL-1a for importer@patchew.org; Thu, 13 Sep 2018 09:11:31 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47065) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7a-0002cR-2L for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7U-00029A-VH for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43774) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7S-00026m-DI; Thu, 13 Sep 2018 08:53:14 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B0FEB30E5F83; Thu, 13 Sep 2018 12:53:13 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5CB1760BE5; Thu, 13 Sep 2018 12:53:11 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:12 +0200 Message-Id: <20180913125217.23173-13-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Thu, 13 Sep 2018 12:53:13 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When starting an active commit job, other callbacks can run before mirror_start_job() calls bdrv_ref() where needed and cause the nodes to go away. Add another pair of bdrv_ref/unref() around it to protect against this case. Signed-off-by: Kevin Wolf --- block/mirror.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/block/mirror.c b/block/mirror.c index 56d9ef7474..c8657991cf 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1697,7 +1697,14 @@ void commit_active_start(const char *job_id, BlockDr= iverState *bs, =20 orig_base_flags =3D bdrv_get_flags(base); =20 + /* bdrv_reopen() drains, which might make the BDSes go away before a + * reference is taken in mirror_start_job(). */ + bdrv_ref(bs); + bdrv_ref(base); + if (bdrv_reopen(base, bs->open_flags, errp)) { + bdrv_unref(bs); + bdrv_unref(base); return; } =20 @@ -1707,6 +1714,10 @@ void commit_active_start(const char *job_id, BlockDr= iverState *bs, &commit_active_job_driver, false, base, auto_complete, filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND, &local_err); + + bdrv_unref(bs); + bdrv_unref(base); + if (local_err) { error_propagate(errp, local_err); goto error_restore_flags; --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536844437422215.2837670446262; Thu, 13 Sep 2018 06:13:57 -0700 (PDT) Received: from localhost ([::1]:42386 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RRU-0001PQ-8G for importer@patchew.org; Thu, 13 Sep 2018 09:13:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47093) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7b-0002dT-6M for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7a-0002EU-3o for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58143) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7U-00028X-Ih; Thu, 13 Sep 2018 08:53:16 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DFA7780F9F; Thu, 13 Sep 2018 12:53:15 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E31360BE5; Thu, 13 Sep 2018 12:53:13 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:13 +0200 Message-Id: <20180913125217.23173-14-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 13 Sep 2018 12:53:15 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Block jobs claim in .drained_poll() that they are in a quiescent state as soon as job->deferred_to_main_loop is true. This is obviously wrong, they still have a completion BH to run. We only get away with this because commit 91af091f923 added an unconditional aio_poll(false) to the drain functions, but this is bypassing the regular drain mechanisms. However, just removing this and telling that the job is still active doesn't work either: The completion callbacks themselves call drain functions (directly, or indirectly with bdrv_reopen), so they would deadlock then. As a better lie, tell that the job is active as long as the BH is pending, but falsely call it quiescent from the point in the BH when the completion callback is called. At this point, nested drain calls won't deadlock because they ignore the job, and outer drains will wait for the job to really reach a quiescent state because the callback is already running. Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz --- include/qemu/job.h | 3 +++ blockjob.c | 2 +- job.c | 11 ++++++++++- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/qemu/job.h b/include/qemu/job.h index 63c60ef1ba..9e7cd1e4a0 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -76,6 +76,9 @@ typedef struct Job { * Set to false by the job while the coroutine has yielded and may be * re-entered by job_enter(). There may still be I/O or event loop act= ivity * pending. Accessed under block_job_mutex (in blockjob.c). + * + * When the job is deferred to the main loop, busy is true as long as = the + * bottom half is still pending. */ bool busy; =20 diff --git a/blockjob.c b/blockjob.c index 8d27e8e1ea..617d86fe93 100644 --- a/blockjob.c +++ b/blockjob.c @@ -164,7 +164,7 @@ static bool child_job_drained_poll(BdrvChild *c) /* An inactive or completed job doesn't have any pending requests. Jobs * with !job->busy are either already paused or have a pause point aft= er * being reentered, so no job driver code will run before they pause. = */ - if (!job->busy || job_is_completed(job) || job->deferred_to_main_loop)= { + if (!job->busy || job_is_completed(job)) { return false; } =20 diff --git a/job.c b/job.c index fa74558ba0..00a1cd128d 100644 --- a/job.c +++ b/job.c @@ -857,7 +857,16 @@ static void job_exit(void *opaque) AioContext *ctx =3D job->aio_context; =20 aio_context_acquire(ctx); + + /* This is a lie, we're not quiescent, but still doing the completion + * callbacks. However, completion callbacks tend to involve operations= that + * drain block nodes, and if .drained_poll still returned true, we wou= ld + * deadlock. */ + job->busy =3D false; + job_event_idle(job); + job_completed(job); + aio_context_release(ctx); } =20 @@ -872,8 +881,8 @@ static void coroutine_fn job_co_entry(void *opaque) assert(job && job->driver && job->driver->run); job_pause_point(job); job->ret =3D job->driver->run(job, &job->err); - job_event_idle(job); job->deferred_to_main_loop =3D true; + job->busy =3D true; aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job); } =20 --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536844303073893.4539327290768; Thu, 13 Sep 2018 06:11:43 -0700 (PDT) Received: from localhost ([::1]:42378 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RPF-00083o-Tw for importer@patchew.org; Thu, 13 Sep 2018 09:11:37 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47112) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7b-0002dg-Tc for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7b-0002Ex-1A for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39440) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7W-0002AK-Nb; Thu, 13 Sep 2018 08:53:18 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1413680F6B; Thu, 13 Sep 2018 12:53:18 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 34C0160BE0; Thu, 13 Sep 2018 12:53:16 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:14 +0200 Message-Id: <20180913125217.23173-15-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 13 Sep 2018 12:53:18 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" bdrv_drain_poll_top_level() was buggy because it didn't release the AioContext lock of the node to be drained before calling aio_poll(). This way, callbacks called by aio_poll() would possibly take the lock a second time and run into a deadlock with a nested AIO_WAIT_WHILE() call. However, it turns out that the aio_poll() call isn't actually needed any more. It was introduced in commit 91af091f923, which is effectively reverted by this patch. The cases it was supposed to fix are now covered by bdrv_drain_poll(), which waits for block jobs to reach a quiescent state. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng Reviewed-by: Max Reitz --- block/io.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/block/io.c b/block/io.c index 914ba78f1a..8b81ff3913 100644 --- a/block/io.c +++ b/block/io.c @@ -268,10 +268,6 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recurs= ive, static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive, BdrvChild *ignore_parent) { - /* Execute pending BHs first and check everything else only after the = BHs - * have executed. */ - while (aio_poll(bs->aio_context, false)); - return bdrv_drain_poll(bs, recursive, ignore_parent, false); } =20 @@ -511,10 +507,6 @@ static bool bdrv_drain_all_poll(void) BlockDriverState *bs =3D NULL; bool result =3D false; =20 - /* Execute pending BHs first (may modify the graph) and check everythi= ng - * else only after the BHs have executed. */ - while (aio_poll(qemu_get_aio_context(), false)); - /* bdrv_drain_poll() can't make changes to the graph and we are holdin= g the * main AioContext lock, so iterating bdrv_next_all_states() is safe. = */ while ((bs =3D bdrv_next_all_states(bs))) { --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536844127407662.86690887016; Thu, 13 Sep 2018 06:08:47 -0700 (PDT) Received: from localhost ([::1]:42352 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RMU-0005eS-47 for importer@patchew.org; Thu, 13 Sep 2018 09:08:46 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47114) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7c-0002dh-0F for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7b-0002FC-7T for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57008) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7Y-0002Cw-W4; Thu, 13 Sep 2018 08:53:21 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3ADC9300194F; Thu, 13 Sep 2018 12:53:20 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5F8BD60BE0; Thu, 13 Sep 2018 12:53:18 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:15 +0200 Message-Id: <20180913125217.23173-16-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Thu, 13 Sep 2018 12:53:20 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 15/17] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This is a regression test for a deadlock that could occur in callbacks called from the aio_poll() in bdrv_drain_poll_top_level(). The AioContext lock wasn't released and therefore would be taken a second time in the callback. This would cause a possible AIO_WAIT_WHILE() in the callback to hang. Signed-off-by: Kevin Wolf Reviewed-by: Fam Zheng --- tests/test-bdrv-drain.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index c3c17b9ff7..e105c0ae84 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -636,6 +636,17 @@ static void test_iothread_aio_cb(void *opaque, int ret) qemu_event_set(&done_event); } =20 +static void test_iothread_main_thread_bh(void *opaque) +{ + struct test_iothread_data *data =3D opaque; + + /* Test that the AioContext is not yet locked in a random BH that is + * executed during drain, otherwise this would deadlock. */ + aio_context_acquire(bdrv_get_aio_context(data->bs)); + bdrv_flush(data->bs); + aio_context_release(bdrv_get_aio_context(data->bs)); +} + /* * Starts an AIO request on a BDS that runs in the AioContext of iothread = 1. * The request involves a BH on iothread 2 before it can complete. @@ -705,6 +716,8 @@ static void test_iothread_common(enum drain_type drain_= type, int drain_thread) aio_context_acquire(ctx_a); } =20 + aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data= ); + /* The request is running on the IOThread a. Draining its block de= vice * will make sure that it has completed as far as the BDS is conce= rned, * but the drain in this thread can continue immediately after --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843729706235.24116890184985; Thu, 13 Sep 2018 06:02:09 -0700 (PDT) Received: from localhost ([::1]:42318 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RG4-0000Zs-GT for importer@patchew.org; Thu, 13 Sep 2018 09:02:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47244) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7o-0002pE-LN for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7n-0002Tk-Py for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35562) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7h-0002MG-KF; Thu, 13 Sep 2018 08:53:29 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F14D0310E981; Thu, 13 Sep 2018 12:53:28 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 866C060BE0; Thu, 13 Sep 2018 12:53:20 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:16 +0200 Message-Id: <20180913125217.23173-17-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Thu, 13 Sep 2018 12:53:29 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Amongst others, job_finalize_single() calls the .prepare/.commit/.abort callbacks of the individual job driver. Recently, their use was adapted for all block jobs so that they involve code calling AIO_WAIT_WHILE() now. Such code must be called under the AioContext lock for the respective job, but without holding any other AioContext lock. Signed-off-by: Kevin Wolf --- job.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/job.c b/job.c index 00a1cd128d..0b021867da 100644 --- a/job.c +++ b/job.c @@ -718,6 +718,7 @@ static void job_cancel_async(Job *job, bool force) =20 static void job_completed_txn_abort(Job *job) { + AioContext *outer_ctx =3D job->aio_context; AioContext *ctx; JobTxn *txn =3D job->txn; Job *other_job; @@ -731,23 +732,26 @@ static void job_completed_txn_abort(Job *job) txn->aborting =3D true; job_txn_ref(txn); =20 - /* We are the first failed job. Cancel other jobs. */ - QLIST_FOREACH(other_job, &txn->jobs, txn_list) { - ctx =3D other_job->aio_context; - aio_context_acquire(ctx); - } + /* We can only hold the single job's AioContext lock while calling + * job_finalize_single() because the finalization callbacks can involve + * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */ + aio_context_release(outer_ctx); =20 /* Other jobs are effectively cancelled by us, set the status for * them; this job, however, may or may not be cancelled, depending * on the caller, so leave it. */ QLIST_FOREACH(other_job, &txn->jobs, txn_list) { if (other_job !=3D job) { + ctx =3D other_job->aio_context; + aio_context_acquire(ctx); job_cancel_async(other_job, false); + aio_context_release(ctx); } } while (!QLIST_EMPTY(&txn->jobs)) { other_job =3D QLIST_FIRST(&txn->jobs); ctx =3D other_job->aio_context; + aio_context_acquire(ctx); if (!job_is_completed(other_job)) { assert(job_is_cancelled(other_job)); job_finish_sync(other_job, NULL, NULL); @@ -756,6 +760,8 @@ static void job_completed_txn_abort(Job *job) aio_context_release(ctx); } =20 + aio_context_acquire(outer_ctx); + job_txn_unref(txn); } =20 --=20 2.13.6 From nobody Wed Nov 5 18:26:51 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536843803492786.8457546986743; Thu, 13 Sep 2018 06:03:23 -0700 (PDT) Received: from localhost ([::1]:42324 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0RHG-0001YN-3j for importer@patchew.org; Thu, 13 Sep 2018 09:03:22 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47254) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g0R7p-0002q7-D2 for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g0R7o-0002U1-4E for qemu-devel@nongnu.org; Thu, 13 Sep 2018 08:53:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:8072) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g0R7j-0002QP-Ow; Thu, 13 Sep 2018 08:53:31 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 25CDC308ED70; Thu, 13 Sep 2018 12:53:31 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-67.ams2.redhat.com [10.36.117.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 49AAD60BE0; Thu, 13 Sep 2018 12:53:29 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 13 Sep 2018 14:52:17 +0200 Message-Id: <20180913125217.23173-18-kwolf@redhat.com> In-Reply-To: <20180913125217.23173-1-kwolf@redhat.com> References: <20180913125217.23173-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Thu, 13 Sep 2018 12:53:31 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, famz@redhat.com, slp@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort callbacks. Both reasons why .abort could be called for a single job are tested: Either .run or .prepare could return an error. Signed-off-by: Kevin Wolf --- tests/test-bdrv-drain.c | 116 +++++++++++++++++++++++++++++++++++++++++++-= ---- 1 file changed, 104 insertions(+), 12 deletions(-) diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index e105c0ae84..bbc56a055f 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -784,6 +784,8 @@ static void test_iothread_drain_subtree(void) =20 typedef struct TestBlockJob { BlockJob common; + int run_ret; + int prepare_ret; bool should_complete; } TestBlockJob; =20 @@ -793,7 +795,23 @@ static int test_job_prepare(Job *job) =20 /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */ blk_flush(s->common.blk); - return 0; + return s->prepare_ret; +} + +static void test_job_commit(Job *job) +{ + TestBlockJob *s =3D container_of(job, TestBlockJob, common.job); + + /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */ + blk_flush(s->common.blk); +} + +static void test_job_abort(Job *job) +{ + TestBlockJob *s =3D container_of(job, TestBlockJob, common.job); + + /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */ + blk_flush(s->common.blk); } =20 static int coroutine_fn test_job_run(Job *job, Error **errp) @@ -809,7 +827,7 @@ static int coroutine_fn test_job_run(Job *job, Error **= errp) job_pause_point(&s->common.job); } =20 - return 0; + return s->run_ret; } =20 static void test_job_complete(Job *job, Error **errp) @@ -827,14 +845,24 @@ BlockJobDriver test_job_driver =3D { .run =3D test_job_run, .complete =3D test_job_complete, .prepare =3D test_job_prepare, + .commit =3D test_job_commit, + .abort =3D test_job_abort, }, }; =20 -static void test_blockjob_common(enum drain_type drain_type, bool use_ioth= read) +enum test_job_result { + TEST_JOB_SUCCESS, + TEST_JOB_FAIL_RUN, + TEST_JOB_FAIL_PREPARE, +}; + +static void test_blockjob_common(enum drain_type drain_type, bool use_ioth= read, + enum test_job_result result) { BlockBackend *blk_src, *blk_target; BlockDriverState *src, *target; BlockJob *job; + TestBlockJob *tjob; IOThread *iothread =3D NULL; AioContext *ctx; int ret; @@ -858,9 +886,23 @@ static void test_blockjob_common(enum drain_type drain= _type, bool use_iothread) blk_insert_bs(blk_target, target, &error_abort); =20 aio_context_acquire(ctx); - job =3D block_job_create("job0", &test_job_driver, NULL, src, 0, BLK_P= ERM_ALL, - 0, 0, NULL, NULL, &error_abort); + tjob =3D block_job_create("job0", &test_job_driver, NULL, src, + 0, BLK_PERM_ALL, + 0, 0, NULL, NULL, &error_abort); + job =3D &tjob->common; block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abor= t); + + switch (result) { + case TEST_JOB_SUCCESS: + break; + case TEST_JOB_FAIL_RUN: + tjob->run_ret =3D -EIO; + break; + case TEST_JOB_FAIL_PREPARE: + tjob->prepare_ret =3D -EIO; + break; + } + job_start(&job->job); aio_context_release(ctx); =20 @@ -918,7 +960,7 @@ static void test_blockjob_common(enum drain_type drain_= type, bool use_iothread) =20 aio_context_acquire(ctx); ret =3D job_complete_sync(&job->job, &error_abort); - g_assert_cmpint(ret, =3D=3D, 0); + g_assert_cmpint(ret, =3D=3D, (result =3D=3D TEST_JOB_SUCCESS ? 0 : -EI= O)); =20 if (use_iothread) { blk_set_aio_context(blk_src, qemu_get_aio_context()); @@ -937,32 +979,68 @@ static void test_blockjob_common(enum drain_type drai= n_type, bool use_iothread) =20 static void test_blockjob_drain_all(void) { - test_blockjob_common(BDRV_DRAIN_ALL, false); + test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_SUCCESS); } =20 static void test_blockjob_drain(void) { - test_blockjob_common(BDRV_DRAIN, false); + test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_SUCCESS); } =20 static void test_blockjob_drain_subtree(void) { - test_blockjob_common(BDRV_SUBTREE_DRAIN, false); + test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_SUCCESS); +} + +static void test_blockjob_error_drain_all(void) +{ + test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_FAIL_RUN); + test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_FAIL_PREPARE); +} + +static void test_blockjob_error_drain(void) +{ + test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_FAIL_RUN); + test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_FAIL_PREPARE); +} + +static void test_blockjob_error_drain_subtree(void) +{ + test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_FAIL_RUN); + test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_FAIL_PREPARE); } =20 static void test_blockjob_iothread_drain_all(void) { - test_blockjob_common(BDRV_DRAIN_ALL, true); + test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_SUCCESS); } =20 static void test_blockjob_iothread_drain(void) { - test_blockjob_common(BDRV_DRAIN, true); + test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_SUCCESS); } =20 static void test_blockjob_iothread_drain_subtree(void) { - test_blockjob_common(BDRV_SUBTREE_DRAIN, true); + test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_SUCCESS); +} + +static void test_blockjob_iothread_error_drain_all(void) +{ + test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_FAIL_RUN); + test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_FAIL_PREPARE); +} + +static void test_blockjob_iothread_error_drain(void) +{ + test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_FAIL_RUN); + test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_FAIL_PREPARE); +} + +static void test_blockjob_iothread_error_drain_subtree(void) +{ + test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_FAIL_RUN); + test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_FAIL_PREPARE); } =20 =20 @@ -1434,6 +1512,13 @@ int main(int argc, char **argv) g_test_add_func("/bdrv-drain/blockjob/drain_subtree", test_blockjob_drain_subtree); =20 + g_test_add_func("/bdrv-drain/blockjob/error/drain_all", + test_blockjob_error_drain_all); + g_test_add_func("/bdrv-drain/blockjob/error/drain", + test_blockjob_error_drain); + g_test_add_func("/bdrv-drain/blockjob/error/drain_subtree", + test_blockjob_error_drain_subtree); + g_test_add_func("/bdrv-drain/blockjob/iothread/drain_all", test_blockjob_iothread_drain_all); g_test_add_func("/bdrv-drain/blockjob/iothread/drain", @@ -1441,6 +1526,13 @@ int main(int argc, char **argv) g_test_add_func("/bdrv-drain/blockjob/iothread/drain_subtree", test_blockjob_iothread_drain_subtree); =20 + g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain_all", + test_blockjob_iothread_error_drain_all); + g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain", + test_blockjob_iothread_error_drain); + g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain_subtree", + test_blockjob_iothread_error_drain_subtree); + g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain); g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_a= ll); g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain); --=20 2.13.6