From nobody Sun May 5 03:11:54 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1503406363524711.9667996131333; Tue, 22 Aug 2017 05:52:43 -0700 (PDT) Received: from localhost ([::1]:43679 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dk8fi-0004Wx-20 for importer@patchew.org; Tue, 22 Aug 2017 08:52:42 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42962) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dk8eT-0003j2-48 for qemu-devel@nongnu.org; Tue, 22 Aug 2017 08:51:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dk8eR-0005fO-V2 for qemu-devel@nongnu.org; Tue, 22 Aug 2017 08:51:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59432) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dk8eM-0005dO-7u; Tue, 22 Aug 2017 08:51:18 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 351BF8124A; Tue, 22 Aug 2017 12:51:17 +0000 (UTC) Received: from localhost (ovpn-117-176.ams2.redhat.com [10.36.117.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id AF0537E660; Tue, 22 Aug 2017 12:51:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 351BF8124A Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=stefanha@redhat.com From: Stefan Hajnoczi To: Date: Tue, 22 Aug 2017 13:51:13 +0100 Message-Id: <20170822125113.5025-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 22 Aug 2017 12:51:17 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH] nbd-client: avoid spurious qio_channel_yield() re-entry X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Stefan Hajnoczi , Paolo Bonzini , "Dr. David Alan Gilbert" , qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The following scenario leads to an assertion failure in qio_channel_yield(): 1. Request coroutine calls qio_channel_yield() successfully when sending would block on the socket. It is now yielded. 2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because nbd_receive_reply() failed. 3. Request coroutine is entered and returns from qio_channel_yield(). Note that the socket fd handler has not fired yet so ioc->write_coroutine is still set. 4. Request coroutine attempts to send the request body with nbd_rwv() but the socket would still block. qio_channel_yield() is called again and assert(!ioc->write_coroutine) is hit. The problem is that nbd_read_reply_entry() does not distinguish between request coroutines that are waiting to receive a reply and those that are not. This patch adds a per-request bool receiving flag so nbd_read_reply_entry() can avoid spurious aio_wake() calls. Reported-by: Dr. David Alan Gilbert Signed-off-by: Stefan Hajnoczi Reviewed-by: Paolo Bonzini Reviewed-by: Vladimir Sementsov-Ogievskiy Tested-by: Eric Blake --- This should fix the issue that Dave is seeing but I'm concerned that there are more problems in nbd-client.c. We don't have good abstractions for writing coroutine socket I/O code. Something like Go's channels would avoid manual low-level coroutine calls. There is currently no way to cancel qio_channel_yield() so requests doing I/O may remain in-flight indefinitely and nbd-client.c doesn't join them... block/nbd-client.h | 7 ++++++- block/nbd-client.c | 35 ++++++++++++++++++++++------------- 2 files changed, 28 insertions(+), 14 deletions(-) diff --git a/block/nbd-client.h b/block/nbd-client.h index 1935ffbcaa..b435754b82 100644 --- a/block/nbd-client.h +++ b/block/nbd-client.h @@ -17,6 +17,11 @@ =20 #define MAX_NBD_REQUESTS 16 =20 +typedef struct { + Coroutine *coroutine; + bool receiving; /* waiting for read_reply_co? */ +} NBDClientRequest; + typedef struct NBDClientSession { QIOChannelSocket *sioc; /* The master data channel */ QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) = */ @@ -27,7 +32,7 @@ typedef struct NBDClientSession { Coroutine *read_reply_co; int in_flight; =20 - Coroutine *recv_coroutine[MAX_NBD_REQUESTS]; + NBDClientRequest requests[MAX_NBD_REQUESTS]; NBDReply reply; bool quit; } NBDClientSession; diff --git a/block/nbd-client.c b/block/nbd-client.c index 422ecb4307..c2834f6b47 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -39,8 +39,10 @@ static void nbd_recv_coroutines_enter_all(NBDClientSessi= on *s) int i; =20 for (i =3D 0; i < MAX_NBD_REQUESTS; i++) { - if (s->recv_coroutine[i]) { - aio_co_wake(s->recv_coroutine[i]); + NBDClientRequest *req =3D &s->requests[i]; + + if (req->coroutine && req->receiving) { + aio_co_wake(req->coroutine); } } } @@ -88,28 +90,28 @@ static coroutine_fn void nbd_read_reply_entry(void *opa= que) * one coroutine is called until the reply finishes. */ i =3D HANDLE_TO_INDEX(s, s->reply.handle); - if (i >=3D MAX_NBD_REQUESTS || !s->recv_coroutine[i]) { + if (i >=3D MAX_NBD_REQUESTS || + !s->requests[i].coroutine || + !s->requests[i].receiving) { break; } =20 - /* We're woken up by the recv_coroutine itself. Note that there + /* We're woken up again by the request itself. Note that there * is no race between yielding and reentering read_reply_co. This * is because: * - * - if recv_coroutine[i] runs on the same AioContext, it is only + * - if the request runs on the same AioContext, it is only * entered after we yield * - * - if recv_coroutine[i] runs on a different AioContext, reenteri= ng + * - if the request runs on a different AioContext, reentering * read_reply_co happens through a bottom half, which can only * run after we yield. */ - aio_co_wake(s->recv_coroutine[i]); + aio_co_wake(s->requests[i].coroutine); qemu_coroutine_yield(); } =20 - if (ret < 0) { - s->quit =3D true; - } + s->quit =3D true; nbd_recv_coroutines_enter_all(s); s->read_reply_co =3D NULL; } @@ -128,14 +130,17 @@ static int nbd_co_send_request(BlockDriverState *bs, s->in_flight++; =20 for (i =3D 0; i < MAX_NBD_REQUESTS; i++) { - if (s->recv_coroutine[i] =3D=3D NULL) { - s->recv_coroutine[i] =3D qemu_coroutine_self(); + if (s->requests[i].coroutine =3D=3D NULL) { break; } } =20 g_assert(qemu_in_coroutine()); assert(i < MAX_NBD_REQUESTS); + + s->requests[i].coroutine =3D qemu_coroutine_self(); + s->requests[i].receiving =3D false; + request->handle =3D INDEX_TO_HANDLE(s, i); =20 if (s->quit) { @@ -173,10 +178,13 @@ static void nbd_co_receive_reply(NBDClientSession *s, NBDReply *reply, QEMUIOVector *qiov) { + int i =3D HANDLE_TO_INDEX(s, request->handle); int ret; =20 /* Wait until we're woken up by nbd_read_reply_entry. */ + s->requests[i].receiving =3D true; qemu_coroutine_yield(); + s->requests[i].receiving =3D false; *reply =3D s->reply; if (reply->handle !=3D request->handle || !s->ioc || s->quit) { reply->error =3D EIO; @@ -186,6 +194,7 @@ static void nbd_co_receive_reply(NBDClientSession *s, NULL); if (ret !=3D request->len) { reply->error =3D EIO; + s->quit =3D true; } } =20 @@ -200,7 +209,7 @@ static void nbd_coroutine_end(BlockDriverState *bs, NBDClientSession *s =3D nbd_get_client_session(bs); int i =3D HANDLE_TO_INDEX(s, request->handle); =20 - s->recv_coroutine[i] =3D NULL; + s->requests[i].coroutine =3D NULL; =20 /* Kick the read_reply_co to get the next reply. */ if (s->read_reply_co) { --=20 2.13.5