From nobody Wed Dec 17 05:38:49 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1529341063155270.9842708314311; Mon, 18 Jun 2018 09:57:43 -0700 (PDT) Received: from localhost ([::1]:36006 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUxTK-0005NI-FL for importer@patchew.org; Mon, 18 Jun 2018 12:57:42 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51796) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUxHO-0004qn-Q9 for qemu-devel@nongnu.org; Mon, 18 Jun 2018 12:45:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fUxHM-0006FS-S8 for qemu-devel@nongnu.org; Mon, 18 Jun 2018 12:45:22 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46488 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fUxHI-0006Bs-8T; Mon, 18 Jun 2018 12:45:16 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 624417DAC3; Mon, 18 Jun 2018 16:45:15 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-120.ams2.redhat.com [10.36.117.120]) by smtp.corp.redhat.com (Postfix) with ESMTP id B03702026D68; Mon, 18 Jun 2018 16:45:14 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Mon, 18 Jun 2018 18:44:36 +0200 Message-Id: <20180618164504.24488-8-kwolf@redhat.com> In-Reply-To: <20180618164504.24488-1-kwolf@redhat.com> References: <20180618164504.24488-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 18 Jun 2018 16:45:15 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 18 Jun 2018 16:45:15 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'kwolf@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 07/35] block: Really pause block jobs on drain X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf --- include/block/block.h | 8 ++++++++ include/block/block_int.h | 7 +++++++ include/block/blockjob_int.h | 8 ++++++++ block.c | 9 +++++++++ block/io.c | 40 ++++++++++++++++++++++++++++++++++------ block/mirror.c | 8 ++++++++ blockjob.c | 23 +++++++++++++++++++++++ tests/test-bdrv-drain.c | 18 ++++++++++-------- 8 files changed, 107 insertions(+), 14 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index e677080c4e..cebbb39c6c 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -568,6 +568,14 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, B= drvChild *ignore); void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore); =20 /** + * bdrv_drain_poll: + * + * Poll for pending requests in @bs and its parents (except for + * @ignore_parent). This is part of bdrv_drained_begin. + */ +bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent); + +/** * bdrv_drained_begin: * * Begin a quiesced section for exclusive access to the BDS, by disabling diff --git a/include/block/block_int.h b/include/block/block_int.h index 327e478a73..1b811db8ec 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -605,6 +605,13 @@ struct BdrvChildRole { void (*drained_begin)(BdrvChild *child); void (*drained_end)(BdrvChild *child); =20 + /* + * Returns whether the parent has pending requests for the child. This + * callback is polled after .drained_begin() has been called until all + * activity on the child has stopped. + */ + bool (*drained_poll)(BdrvChild *child); + /* Notifies the parent that the child has been activated/inactivated (= e.g. * when migration is completing) and it can start/stop requesting * permissions and doing I/O on it. */ diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h index 5cd50c6639..e4a318dd15 100644 --- a/include/block/blockjob_int.h +++ b/include/block/blockjob_int.h @@ -39,6 +39,14 @@ struct BlockJobDriver { JobDriver job_driver; =20 /* + * Returns whether the job has pending requests for the child or will + * submit new requests before the next pause point. This callback is p= olled + * in the context of draining a job node after requesting that the job= be + * paused, until all activity on the child has stopped. + */ + bool (*drained_poll)(BlockJob *job); + + /* * If the callback is not NULL, it will be invoked before the job is * resumed in a new AioContext. This is the place to move any resourc= es * besides job->blk to the new AioContext. diff --git a/block.c b/block.c index afe30caac3..8cf9cd8855 100644 --- a/block.c +++ b/block.c @@ -821,6 +821,12 @@ static void bdrv_child_cb_drained_begin(BdrvChild *chi= ld) bdrv_drained_begin(bs); } =20 +static bool bdrv_child_cb_drained_poll(BdrvChild *child) +{ + BlockDriverState *bs =3D child->opaque; + return bdrv_drain_poll(bs, NULL); +} + static void bdrv_child_cb_drained_end(BdrvChild *child) { BlockDriverState *bs =3D child->opaque; @@ -905,6 +911,7 @@ const BdrvChildRole child_file =3D { .get_parent_desc =3D bdrv_child_get_parent_desc, .inherit_options =3D bdrv_inherited_options, .drained_begin =3D bdrv_child_cb_drained_begin, + .drained_poll =3D bdrv_child_cb_drained_poll, .drained_end =3D bdrv_child_cb_drained_end, .attach =3D bdrv_child_cb_attach, .detach =3D bdrv_child_cb_detach, @@ -929,6 +936,7 @@ const BdrvChildRole child_format =3D { .get_parent_desc =3D bdrv_child_get_parent_desc, .inherit_options =3D bdrv_inherited_fmt_options, .drained_begin =3D bdrv_child_cb_drained_begin, + .drained_poll =3D bdrv_child_cb_drained_poll, .drained_end =3D bdrv_child_cb_drained_end, .attach =3D bdrv_child_cb_attach, .detach =3D bdrv_child_cb_detach, @@ -1048,6 +1056,7 @@ const BdrvChildRole child_backing =3D { .detach =3D bdrv_backing_detach, .inherit_options =3D bdrv_backing_options, .drained_begin =3D bdrv_child_cb_drained_begin, + .drained_poll =3D bdrv_child_cb_drained_poll, .drained_end =3D bdrv_child_cb_drained_end, .inactivate =3D bdrv_child_cb_inactivate, .update_filename =3D bdrv_backing_update_filename, diff --git a/block/io.c b/block/io.c index bc7a2d78b8..5820e73bb2 100644 --- a/block/io.c +++ b/block/io.c @@ -69,6 +69,23 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvC= hild *ignore) } } =20 +static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *igno= re) +{ + BdrvChild *c, *next; + bool busy =3D false; + + QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) { + if (c =3D=3D ignore) { + continue; + } + if (c->role->drained_poll) { + busy |=3D c->role->drained_poll(c); + } + } + + return busy; +} + static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src) { dst->opt_transfer =3D MAX(dst->opt_transfer, src->opt_transfer); @@ -183,21 +200,32 @@ static void bdrv_drain_invoke(BlockDriverState *bs, b= ool begin) } =20 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() = */ -static bool bdrv_drain_poll(BlockDriverState *bs) +bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent) +{ + if (bdrv_parent_drained_poll(bs, ignore_parent)) { + return true; + } + + return atomic_read(&bs->in_flight); +} + +static bool bdrv_drain_poll_top_level(BlockDriverState *bs, + BdrvChild *ignore_parent) { /* Execute pending BHs first and check everything else only after the = BHs * have executed. */ while (aio_poll(bs->aio_context, false)); - return atomic_read(&bs->in_flight); + + return bdrv_drain_poll(bs, ignore_parent); } =20 -static bool bdrv_drain_recurse(BlockDriverState *bs) +static bool bdrv_drain_recurse(BlockDriverState *bs, BdrvChild *parent) { BdrvChild *child, *tmp; bool waited; =20 /* Wait for drained requests to finish */ - waited =3D BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs)); + waited =3D BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent)); =20 QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) { BlockDriverState *bs =3D child->bs; @@ -214,7 +242,7 @@ static bool bdrv_drain_recurse(BlockDriverState *bs) */ bdrv_ref(bs); } - waited |=3D bdrv_drain_recurse(bs); + waited |=3D bdrv_drain_recurse(bs, child); if (in_main_loop) { bdrv_unref(bs); } @@ -290,7 +318,7 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool r= ecursive, =20 bdrv_parent_drained_begin(bs, parent); bdrv_drain_invoke(bs, true); - bdrv_drain_recurse(bs); + bdrv_drain_recurse(bs, parent); =20 if (recursive) { bs->recursive_quiesce_counter++; diff --git a/block/mirror.c b/block/mirror.c index 435268bbbf..c2146c1ab3 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -964,6 +964,12 @@ static void mirror_pause(Job *job) mirror_wait_for_all_io(s); } =20 +static bool mirror_drained_poll(BlockJob *job) +{ + MirrorBlockJob *s =3D container_of(job, MirrorBlockJob, common); + return !!s->in_flight; +} + static void mirror_attached_aio_context(BlockJob *job, AioContext *new_con= text) { MirrorBlockJob *s =3D container_of(job, MirrorBlockJob, common); @@ -997,6 +1003,7 @@ static const BlockJobDriver mirror_job_driver =3D { .pause =3D mirror_pause, .complete =3D mirror_complete, }, + .drained_poll =3D mirror_drained_poll, .attached_aio_context =3D mirror_attached_aio_context, .drain =3D mirror_drain, }; @@ -1012,6 +1019,7 @@ static const BlockJobDriver commit_active_job_driver = =3D { .pause =3D mirror_pause, .complete =3D mirror_complete, }, + .drained_poll =3D mirror_drained_poll, .attached_aio_context =3D mirror_attached_aio_context, .drain =3D mirror_drain, }; diff --git a/blockjob.c b/blockjob.c index 0306533a2e..be5903aa96 100644 --- a/blockjob.c +++ b/blockjob.c @@ -155,6 +155,28 @@ static void child_job_drained_begin(BdrvChild *c) job_pause(&job->job); } =20 +static bool child_job_drained_poll(BdrvChild *c) +{ + BlockJob *bjob =3D c->opaque; + Job *job =3D &bjob->job; + const BlockJobDriver *drv =3D block_job_driver(bjob); + + /* An inactive or completed job doesn't have any pending requests. Jobs + * with !job->busy are either already paused or have a pause point aft= er + * being reentered, so no job driver code will run before they pause. = */ + if (!job->busy || job_is_completed(job) || job->deferred_to_main_loop)= { + return false; + } + + /* Otherwise, assume that it isn't fully stopped yet, but allow the jo= b to + * override this assumption. */ + if (drv->drained_poll) { + return drv->drained_poll(bjob); + } else { + return true; + } +} + static void child_job_drained_end(BdrvChild *c) { BlockJob *job =3D c->opaque; @@ -164,6 +186,7 @@ static void child_job_drained_end(BdrvChild *c) static const BdrvChildRole child_job =3D { .get_parent_desc =3D child_job_get_parent_desc, .drained_begin =3D child_job_drained_begin, + .drained_poll =3D child_job_drained_poll, .drained_end =3D child_job_drained_end, .stay_at_node =3D true, }; diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index cc03bc171b..22d31c953e 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -686,7 +686,11 @@ static void coroutine_fn test_job_start(void *opaque) =20 job_transition_to_ready(&s->common.job); while (!s->should_complete) { - job_sleep_ns(&s->common.job, 100000); + /* Avoid block_job_sleep_ns() because it marks the job as !busy. We + * want to emulate some actual activity (probably some I/O) here so + * that drain has to wait for this acitivity to stop. */ + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000); + job_pause_point(&s->common.job); } =20 job_defer_to_main_loop(&s->common.job, test_job_completed, NULL); @@ -733,7 +737,7 @@ static void test_blockjob_common(enum drain_type drain_= type) =20 g_assert_cmpint(job->job.pause_count, =3D=3D, 0); g_assert_false(job->job.paused); - g_assert_false(job->job.busy); /* We're in job_sleep_ns() */ + g_assert_true(job->job.busy); /* We're in job_sleep_ns() */ =20 do_drain_begin(drain_type, src); =20 @@ -743,15 +747,14 @@ static void test_blockjob_common(enum drain_type drai= n_type) } else { g_assert_cmpint(job->job.pause_count, =3D=3D, 1); } - /* XXX We don't wait until the job is actually paused. Is this okay? */ - /* g_assert_true(job->job.paused); */ + g_assert_true(job->job.paused); g_assert_false(job->job.busy); /* The job is paused */ =20 do_drain_end(drain_type, src); =20 g_assert_cmpint(job->job.pause_count, =3D=3D, 0); g_assert_false(job->job.paused); - g_assert_false(job->job.busy); /* We're in job_sleep_ns() */ + g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */ =20 do_drain_begin(drain_type, target); =20 @@ -761,15 +764,14 @@ static void test_blockjob_common(enum drain_type drai= n_type) } else { g_assert_cmpint(job->job.pause_count, =3D=3D, 1); } - /* XXX We don't wait until the job is actually paused. Is this okay? */ - /* g_assert_true(job->job.paused); */ + g_assert_true(job->job.paused); g_assert_false(job->job.busy); /* The job is paused */ =20 do_drain_end(drain_type, target); =20 g_assert_cmpint(job->job.pause_count, =3D=3D, 0); g_assert_false(job->job.paused); - g_assert_false(job->job.busy); /* We're in job_sleep_ns() */ + g_assert_true(job->job.busy); /* We're in job_sleep_ns() */ =20 ret =3D job_complete_sync(&job->job, &error_abort); g_assert_cmpint(ret, =3D=3D, 0); --=20 2.13.6