From nobody Tue Apr 8 13:51:51 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1625835854; cv=none; d=zohomail.com; s=zohoarc; b=GBZyqfDwvALC8lk8CBAvhT/ib0NAsi4h9Ygig96Ypyi4rrbgui8TBB/YJRYC1nCk4yGd0KnvCJycw5UtX/+r9FcKmHiKyGOZm/tpCBadR0nC/Y7zukmRLr1pR2iWHIjMcsMLavh/Y6DE+k/ZwnY6QrnAafmTPt6CkfiCAJ2Gdgk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1625835854; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=tT+cGUHDGnf1Vv/6DyrKIouJ9Z2O3pOCdqwg34D+RoM=; b=X3D7LKMnEe5gOd3rehJjLt2+xkDO4O4e2mbzyd0GNkYfHok6oW2xHEKn2pVO/HePfa7JzqDqGaDteGMSvcgp7W7q1BRS49ADngnuacbhimDhRJBScw6JfmzNkK4J+G4UvQdFX0bFJ98BvohMNBiSXC6vPIA4WDAA+2K0+2Wg/u8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1625835854425285.06650285798446; Fri, 9 Jul 2021 06:04:14 -0700 (PDT) Received: from localhost ([::1]:48998 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1qAv-0001z2-8M for importer@patchew.org; Fri, 09 Jul 2021 09:04:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57172) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1pyM-0003AQ-Ee for qemu-devel@nongnu.org; Fri, 09 Jul 2021 08:51:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:58811) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1pyK-0003P3-CK for qemu-devel@nongnu.org; Fri, 09 Jul 2021 08:51:14 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-559-by8N-imyPEmy0N8nh_DAxA-1; Fri, 09 Jul 2021 08:51:08 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4B0188015F5; Fri, 9 Jul 2021 12:51:07 +0000 (UTC) Received: from merkur.redhat.com (ovpn-113-203.ams2.redhat.com [10.36.113.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id D86C360843; Fri, 9 Jul 2021 12:51:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625835071; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tT+cGUHDGnf1Vv/6DyrKIouJ9Z2O3pOCdqwg34D+RoM=; b=jOKk1w3DZ7g96Dy5FRFkj9NjmSQHOYaXP+Uc6C/mDCjgwU3gy2ZXrxVM6vfqsvrjmmdPp+ 915ANd9H/7DTWgU8eX3u6L5Any/OALPdn623lw53iSKJrGS5jwI+95isJeETdciznzES7Y oxF34LqcVoaVJqLdOl95XyzVskUQgQc= X-MC-Unique: by8N-imyPEmy0N8nh_DAxA-1 From: Kevin Wolf To: qemu-block@nongnu.org Subject: [PULL 06/28] block/rbd: migrate from aio to coroutines Date: Fri, 9 Jul 2021 14:50:13 +0200 Message-Id: <20210709125035.191321-7-kwolf@redhat.com> In-Reply-To: <20210709125035.191321-1-kwolf@redhat.com> References: <20210709125035.191321-1-kwolf@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=kwolf@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=216.205.24.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.45, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, peter.maydell@linaro.org, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1625835873598100001 Content-Type: text/plain; charset="utf-8" From: Peter Lieven Signed-off-by: Peter Lieven Reviewed-by: Ilya Dryomov Message-Id: <20210702172356.11574-5-idryomov@gmail.com> Signed-off-by: Kevin Wolf --- block/rbd.c | 252 +++++++++++++++++++--------------------------------- 1 file changed, 90 insertions(+), 162 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index e2028d3db5..380ad28861 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -78,22 +78,6 @@ typedef enum { RBD_AIO_FLUSH } RBDAIOCmd; =20 -typedef struct RBDAIOCB { - BlockAIOCB common; - int64_t ret; - QEMUIOVector *qiov; - RBDAIOCmd cmd; - int error; - struct BDRVRBDState *s; -} RBDAIOCB; - -typedef struct RADOSCB { - RBDAIOCB *acb; - struct BDRVRBDState *s; - int64_t size; - int64_t ret; -} RADOSCB; - typedef struct BDRVRBDState { rados_t cluster; rados_ioctx_t io_ctx; @@ -105,6 +89,13 @@ typedef struct BDRVRBDState { uint64_t object_size; } BDRVRBDState; =20 +typedef struct RBDTask { + BlockDriverState *bs; + Coroutine *co; + bool complete; + int64_t ret; +} RBDTask; + static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, BlockdevOptionsRbd *opts, bool cache, const char *keypairs, const char *secretid, @@ -337,13 +328,6 @@ static int qemu_rbd_set_keypairs(rados_t cluster, cons= t char *keypairs_json, return ret; } =20 -static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs) -{ - RBDAIOCB *acb =3D rcb->acb; - iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0, - acb->qiov->size - offs); -} - #ifdef LIBRBD_SUPPORTS_ENCRYPTION static int qemu_rbd_convert_luks_options( RbdEncryptionOptionsLUKSBase *luks_opts, @@ -733,46 +717,6 @@ exit: return ret; } =20 -/* - * This aio completion is being called from rbd_finish_bh() and runs in qe= mu - * BH context. - */ -static void qemu_rbd_complete_aio(RADOSCB *rcb) -{ - RBDAIOCB *acb =3D rcb->acb; - int64_t r; - - r =3D rcb->ret; - - if (acb->cmd !=3D RBD_AIO_READ) { - if (r < 0) { - acb->ret =3D r; - acb->error =3D 1; - } else if (!acb->error) { - acb->ret =3D rcb->size; - } - } else { - if (r < 0) { - qemu_rbd_memset(rcb, 0); - acb->ret =3D r; - acb->error =3D 1; - } else if (r < rcb->size) { - qemu_rbd_memset(rcb, r); - if (!acb->error) { - acb->ret =3D rcb->size; - } - } else if (!acb->error) { - acb->ret =3D r; - } - } - - g_free(rcb); - - acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret)); - - qemu_aio_unref(acb); -} - static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, Error **errp) { const char **vals; @@ -1122,89 +1066,59 @@ static int qemu_rbd_resize(BlockDriverState *bs, ui= nt64_t size) return 0; } =20 -static const AIOCBInfo rbd_aiocb_info =3D { - .aiocb_size =3D sizeof(RBDAIOCB), -}; - -static void rbd_finish_bh(void *opaque) +static void qemu_rbd_finish_bh(void *opaque) { - RADOSCB *rcb =3D opaque; - qemu_rbd_complete_aio(rcb); + RBDTask *task =3D opaque; + task->complete =3D 1; + aio_co_wake(task->co); } =20 /* - * This is the callback function for rbd_aio_read and _write + * This is the completion callback function for all rbd aio calls + * started from qemu_rbd_start_co(). * * Note: this function is being called from a non qemu thread so * we need to be careful about what we do here. Generally we only * schedule a BH, and do the rest of the io completion handling - * from rbd_finish_bh() which runs in a qemu context. + * from qemu_rbd_finish_bh() which runs in a qemu context. */ -static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb) +static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task) { - RBDAIOCB *acb =3D rcb->acb; - - rcb->ret =3D rbd_aio_get_return_value(c); + task->ret =3D rbd_aio_get_return_value(c); rbd_aio_release(c); - - replay_bh_schedule_oneshot_event(bdrv_get_aio_context(acb->common.bs), - rbd_finish_bh, rcb); + aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs), + qemu_rbd_finish_bh, task); } =20 -static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, - int64_t off, - QEMUIOVector *qiov, - int64_t size, - BlockCompletionFunc *cb, - void *opaque, - RBDAIOCmd cmd) +static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, + int flags, + RBDAIOCmd cmd) { - RBDAIOCB *acb; - RADOSCB *rcb =3D NULL; + BDRVRBDState *s =3D bs->opaque; + RBDTask task =3D { .bs =3D bs, .co =3D qemu_coroutine_self() }; rbd_completion_t c; int r; =20 - BDRVRBDState *s =3D bs->opaque; - - acb =3D qemu_aio_get(&rbd_aiocb_info, bs, cb, opaque); - acb->cmd =3D cmd; - acb->qiov =3D qiov; - assert(!qiov || qiov->size =3D=3D size); - - rcb =3D g_new(RADOSCB, 1); + assert(!qiov || qiov->size =3D=3D bytes); =20 - acb->ret =3D 0; - acb->error =3D 0; - acb->s =3D s; - - rcb->acb =3D acb; - rcb->s =3D acb->s; - rcb->size =3D size; - r =3D rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb= , &c); + r =3D rbd_aio_create_completion(&task, + (rbd_callback_t) qemu_rbd_completion_cb,= &c); if (r < 0) { - goto failed; + return r; } =20 switch (cmd) { - case RBD_AIO_WRITE: - /* - * RBD APIs don't allow us to write more than actual size, so in o= rder - * to support growing images, we resize the image before write - * operations that exceed the current size. - */ - if (off + size > s->image_size) { - r =3D qemu_rbd_resize(bs, off + size); - if (r < 0) { - goto failed_completion; - } - } - r =3D rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c); - break; case RBD_AIO_READ: - r =3D rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c); + r =3D rbd_aio_readv(s->image, qiov->iov, qiov->niov, offset, c); + break; + case RBD_AIO_WRITE: + r =3D rbd_aio_writev(s->image, qiov->iov, qiov->niov, offset, c); break; case RBD_AIO_DISCARD: - r =3D rbd_aio_discard(s->image, off, size, c); + r =3D rbd_aio_discard(s->image, offset, bytes, c); break; case RBD_AIO_FLUSH: r =3D rbd_aio_flush(s->image, c); @@ -1214,44 +1128,69 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *= bs, } =20 if (r < 0) { - goto failed_completion; + error_report("rbd request failed early: cmd %d offset %" PRIu64 + " bytes %" PRIu64 " flags %d r %d (%s)", cmd, offset, + bytes, flags, r, strerror(-r)); + rbd_aio_release(c); + return r; } - return &acb->common; =20 -failed_completion: - rbd_aio_release(c); -failed: - g_free(rcb); + while (!task.complete) { + qemu_coroutine_yield(); + } =20 - qemu_aio_unref(acb); - return NULL; + if (task.ret < 0) { + error_report("rbd request failed: cmd %d offset %" PRIu64 " bytes = %" + PRIu64 " flags %d task.ret %" PRIi64 " (%s)", cmd, of= fset, + bytes, flags, task.ret, strerror(-task.ret)); + return task.ret; + } + + /* zero pad short reads */ + if (cmd =3D=3D RBD_AIO_READ && task.ret < qiov->size) { + qemu_iovec_memset(qiov, task.ret, 0, qiov->size - task.ret); + } + + return 0; +} + +static int +coroutine_fn qemu_rbd_co_preadv(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *qiov, + int flags) +{ + return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_READ); } =20 -static BlockAIOCB *qemu_rbd_aio_preadv(BlockDriverState *bs, - uint64_t offset, uint64_t bytes, - QEMUIOVector *qiov, int flags, - BlockCompletionFunc *cb, - void *opaque) +static int +coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *qiov, + int flags) { - return rbd_start_aio(bs, offset, qiov, bytes, cb, opaque, - RBD_AIO_READ); + BDRVRBDState *s =3D bs->opaque; + /* + * RBD APIs don't allow us to write more than actual size, so in order + * to support growing images, we resize the image before write + * operations that exceed the current size. + */ + if (offset + bytes > s->image_size) { + int r =3D qemu_rbd_resize(bs, offset + bytes); + if (r < 0) { + return r; + } + } + return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE= ); } =20 -static BlockAIOCB *qemu_rbd_aio_pwritev(BlockDriverState *bs, - uint64_t offset, uint64_t bytes, - QEMUIOVector *qiov, int flags, - BlockCompletionFunc *cb, - void *opaque) +static int coroutine_fn qemu_rbd_co_flush(BlockDriverState *bs) { - return rbd_start_aio(bs, offset, qiov, bytes, cb, opaque, - RBD_AIO_WRITE); + return qemu_rbd_start_co(bs, 0, 0, NULL, 0, RBD_AIO_FLUSH); } =20 -static BlockAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs, - BlockCompletionFunc *cb, - void *opaque) +static int coroutine_fn qemu_rbd_co_pdiscard(BlockDriverState *bs, + int64_t offset, int count) { - return rbd_start_aio(bs, 0, NULL, 0, cb, opaque, RBD_AIO_FLUSH); + return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD); } =20 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) @@ -1450,16 +1389,6 @@ static int qemu_rbd_snap_list(BlockDriverState *bs, return snap_count; } =20 -static BlockAIOCB *qemu_rbd_aio_pdiscard(BlockDriverState *bs, - int64_t offset, - int bytes, - BlockCompletionFunc *cb, - void *opaque) -{ - return rbd_start_aio(bs, offset, NULL, bytes, cb, opaque, - RBD_AIO_DISCARD); -} - static void coroutine_fn qemu_rbd_co_invalidate_cache(BlockDriverState *bs, Error **errp) { @@ -1540,11 +1469,10 @@ static BlockDriver bdrv_rbd =3D { .bdrv_co_truncate =3D qemu_rbd_co_truncate, .protocol_name =3D "rbd", =20 - .bdrv_aio_preadv =3D qemu_rbd_aio_preadv, - .bdrv_aio_pwritev =3D qemu_rbd_aio_pwritev, - - .bdrv_aio_flush =3D qemu_rbd_aio_flush, - .bdrv_aio_pdiscard =3D qemu_rbd_aio_pdiscard, + .bdrv_co_preadv =3D qemu_rbd_co_preadv, + .bdrv_co_pwritev =3D qemu_rbd_co_pwritev, + .bdrv_co_flush_to_disk =3D qemu_rbd_co_flush, + .bdrv_co_pdiscard =3D qemu_rbd_co_pdiscard, =20 .bdrv_snapshot_create =3D qemu_rbd_snap_create, .bdrv_snapshot_delete =3D qemu_rbd_snap_remove, --=20 2.31.1