From nobody Sun Oct 5 17:36:29 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1602183930; cv=none; d=zohomail.com; s=zohoarc; b=CVyCWBhrYjci6D6oQjeeJilNfKawlQROipg7SOgQ6DqqrvpqjEH5V6fCHQIOZNPw3DQSk3AWl2KDn1tycKOsGCV0ikS4W+LMiP6cc8naDhJRuoPWOkL/7AlS2PSlKKePUwWK3OprQ3YVY8G5fphdc5an9dZYZmQlJKkzc8d44ko= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1602183930; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=er1dq4UvrjPdK8E4Wjs2FYeYIdtl7Wpn3tEwzUuFp1w=; b=dbnRTTEfyOMI7GrQ7ns5bdXfteGjtAqO45tE2NKNVguiIExNp0PQjTl2vleqfoKbVWpJXZSvzM/5wmxdNwdyl7EsOPGf6E4U9j1vbnz8GOckd4SjXr8F/PV2DLyGzF2Rm267EDV01v0sZQfV7bCYkwu+SnLQiQ6/1llsZsnsQoY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1602183930347946.6442079827596; Thu, 8 Oct 2020 12:05:30 -0700 (PDT) Received: from localhost ([::1]:51534 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kQbEH-0003xp-2o for importer@patchew.org; Thu, 08 Oct 2020 15:05:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46792) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQb92-0007sy-M5 for qemu-devel@nongnu.org; Thu, 08 Oct 2020 15:00:04 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:55487) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1kQb8z-0003eq-8f for qemu-devel@nongnu.org; Thu, 08 Oct 2020 15:00:04 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-593-z8VMtOJGMXag00_1_c6Teg-1; Thu, 08 Oct 2020 14:59:56 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 552A48070E3; Thu, 8 Oct 2020 18:59:55 +0000 (UTC) Received: from blue.redhat.com (ovpn-113-14.phx2.redhat.com [10.3.113.14]) by smtp.corp.redhat.com (Postfix) with ESMTP id DCB1F5D9E8; Thu, 8 Oct 2020 18:59:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602183600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=er1dq4UvrjPdK8E4Wjs2FYeYIdtl7Wpn3tEwzUuFp1w=; b=BrIJMwukwC8ScJprtRQk7Mt+BrywOJt6hOPgS9yBfdOF4HWOl+o2iwoUbk4W4m9GvAA9sq ThHGISRKN7jeDeQiJuAWJjM9Mqrd4agyqBx9sIiZFnkxSf2JSjfCY6i3N9XgKyaoXP+xC8 5yRmkd7n9HRc0bevyFEQ/etCMoSEW4c= X-MC-Unique: z8VMtOJGMXag00_1_c6Teg-1 From: Eric Blake To: qemu-devel@nongnu.org Subject: [PULL 2/8] block/nbd: fix drain dead-lock because of nbd reconnect-delay Date: Thu, 8 Oct 2020 13:59:45 -0500 Message-Id: <20201008185951.1026052-3-eblake@redhat.com> In-Reply-To: <20201008185951.1026052-1-eblake@redhat.com> References: <20201008185951.1026052-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=216.205.24.124; envelope-from=eblake@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/08 02:56:27 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Vladimir Sementsov-Ogievskiy , "open list:Network Block Dev..." , Max Reitz Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" From: Vladimir Sementsov-Ogievskiy We pause reconnect process during drained section. So, if we have some requests, waiting for reconnect we should cancel them, otherwise they deadlock the drained section. How to reproduce: 1. Create an image: qemu-img create -f qcow2 xx 100M 2. Start NBD server: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=3D/work/images/cent7.qcow2 -drive \ driver=3Dnbd,server.type=3Dinet,server.host=3D192.168.100.5,server.por= t=3D10809,reconnect-delay=3D60 \ -vnc :0 -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=3D/dev/sdb of=3D/dev/null bs=3D1M count=3D10 - the command should succeed. 5. Now, kill the nbd server, and run dd in the guest again: dd if=3D/dev/sdb of=3D/dev/null bs=3D1M count=3D10 Now Qemu is trying to reconnect, and dd-generated requests are waiting for the connection (they will wait up to 60 seconds (see reconnect-delay option above) and than fail). But suddenly, vm may totally hang in the deadlock. You may need to increase reconnect-delay period to catch the dead-lock. VM doesn't respond because drain dead-lock happens in cpu thread with global mutex taken. That's not good thing by itself and is not fixed by this commit (true way is using iothreads). Still this commit fixes drain dead-lock itself. Note: probably, we can instead continue to reconnect during drained section. To achieve this, we may move negotiation to the connect thread to make it independent of bs aio context. But expanding drained section doesn't seem good anyway. So, let's now fix the bug the simplest way. Signed-off-by: Vladimir Sementsov-Ogievskiy Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake Signed-off-by: Eric Blake --- block/nbd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 9daf003bea30..912ea27be7d8 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(Blo= ckDriverState *bs) } nbd_co_establish_connection_cancel(bs, false); + + if (s->state =3D=3D NBD_CLIENT_CONNECTING_WAIT) { + s->state =3D NBD_CLIENT_CONNECTING_NOWAIT; + qemu_co_queue_restart_all(&s->free_sema); + } } static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs) --=20 2.28.0