From nobody Sat Apr 27 20:59:34 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1641915512633435.10447231619946; Tue, 11 Jan 2022 07:38:32 -0800 (PST) Received: from localhost ([::1]:57084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n7JEF-0003bc-HK for importer@patchew.org; Tue, 11 Jan 2022 10:38:31 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37600) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n7JCL-0000cV-BC for qemu-devel@nongnu.org; Tue, 11 Jan 2022 10:36:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:33288) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n7JCH-0000iB-IN for qemu-devel@nongnu.org; Tue, 11 Jan 2022 10:36:33 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-213-Q-HO1594OCyqc-UVT_Ut2g-1; Tue, 11 Jan 2022 10:36:24 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A54EB1966322; Tue, 11 Jan 2022 15:36:23 +0000 (UTC) Received: from localhost (unknown [10.39.192.97]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C2F94EC84; Tue, 11 Jan 2022 15:36:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641915387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZSWv9JVuS8m91TEuYQoQm6tm2E7qO0d9cvbUGRdmc5s=; b=ZX1xHTaaBv1WO+Cwj4vuHt/iForf76B5niODbpa9NIcDlVGxM2DTHvX4XQdEEdJNCngfoS ThO1cxgfBxjUpkFyrchqtHWHCPgr/cZ07OJBwEQBaiEBDUUoRicGt5CD364Xhe2gK1Bdtz KRv1VUiL0SzslhfyibVaC4qe/4sSHH8= X-MC-Unique: Q-HO1594OCyqc-UVT_Ut2g-1 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PATCH v3 1/2] block-backend: prevent dangling BDS pointers across aio_poll() Date: Tue, 11 Jan 2022 15:36:12 +0000 Message-Id: <20220111153613.25453-2-stefanha@redhat.com> In-Reply-To: <20220111153613.25453-1-stefanha@redhat.com> References: <20220111153613.25453-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=stefanha@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.595, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Vladimir Sementsov-Ogievskiy , qemu-block@nongnu.org, qemu-stable@nongnu.org, Hanna Reitz , Stefan Hajnoczi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1641915514302100001 Content-Type: text/plain; charset="utf-8" The BlockBackend root child can change when aio_poll() is invoked. This happens when a temporary filter node is removed upon blockjob completion, for example. Functions in block/block-backend.c must be aware of this when using a blk_bs() pointer across aio_poll() because the BlockDriverState refcnt may reach 0, resulting in a stale pointer. One example is scsi_device_purge_requests(), which calls blk_drain() to wait for in-flight requests to cancel. If the backup blockjob is active, then the BlockBackend root child is a temporary filter BDS owned by the blockjob. The blockjob can complete during bdrv_drained_begin() and the last reference to the BDS is released when the temporary filter node is removed. This results in a use-after-free when blk_drain() calls bdrv_drained_end(bs) on the dangling pointer. Explicitly hold a reference to bs across block APIs that invoke aio_poll(). Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=3D2021778 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=3D2036178 Signed-off-by: Stefan Hajnoczi --- v3: - Add comment in blk_remove_bs() and reduce scope of bs local variable [Kev= in] v2: - Audit block/block-backend.c and fix additional cases --- block/block-backend.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 12ef80ea17..23e727199b 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -822,16 +822,22 @@ BlockBackend *blk_by_public(BlockBackendPublic *publi= c) void blk_remove_bs(BlockBackend *blk) { ThrottleGroupMember *tgm =3D &blk->public.throttle_group_member; - BlockDriverState *bs; BdrvChild *root; =20 notifier_list_notify(&blk->remove_bs_notifiers, blk); if (tgm->throttle_state) { - bs =3D blk_bs(blk); + BlockDriverState *bs =3D blk_bs(blk); + + /* + * Take a ref in case blk_bs() changes across bdrv_drained_begin()= , for + * example, if a temporary filter node is removed by a blockjob. + */ + bdrv_ref(bs); bdrv_drained_begin(bs); throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, qemu_get_aio_context()); bdrv_drained_end(bs); + bdrv_unref(bs); } =20 blk_update_root_state(blk); @@ -1705,6 +1711,7 @@ void blk_drain(BlockBackend *blk) BlockDriverState *bs =3D blk_bs(blk); =20 if (bs) { + bdrv_ref(bs); bdrv_drained_begin(bs); } =20 @@ -1714,6 +1721,7 @@ void blk_drain(BlockBackend *blk) =20 if (bs) { bdrv_drained_end(bs); + bdrv_unref(bs); } } =20 @@ -2044,10 +2052,13 @@ static int blk_do_set_aio_context(BlockBackend *blk= , AioContext *new_context, int ret; =20 if (bs) { + bdrv_ref(bs); + if (update_root_node) { ret =3D bdrv_child_try_set_aio_context(bs, new_context, blk->r= oot, errp); if (ret < 0) { + bdrv_unref(bs); return ret; } } @@ -2057,6 +2068,8 @@ static int blk_do_set_aio_context(BlockBackend *blk, = AioContext *new_context, throttle_group_attach_aio_context(tgm, new_context); bdrv_drained_end(bs); } + + bdrv_unref(bs); } =20 blk->ctx =3D new_context; @@ -2326,11 +2339,13 @@ void blk_io_limits_disable(BlockBackend *blk) ThrottleGroupMember *tgm =3D &blk->public.throttle_group_member; assert(tgm->throttle_state); if (bs) { + bdrv_ref(bs); bdrv_drained_begin(bs); } throttle_group_unregister_tgm(tgm); if (bs) { bdrv_drained_end(bs); + bdrv_unref(bs); } } =20 --=20 2.33.1 From nobody Sat Apr 27 20:59:34 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 16419156963804.894629557864732; Tue, 11 Jan 2022 07:41:36 -0800 (PST) Received: from localhost ([::1]:33940 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n7JHD-00070w-Ab for importer@patchew.org; Tue, 11 Jan 2022 10:41:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37648) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n7JCS-0000yP-7Q for qemu-devel@nongnu.org; Tue, 11 Jan 2022 10:36:40 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:42510) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n7JCP-0000kD-V0 for qemu-devel@nongnu.org; Tue, 11 Jan 2022 10:36:39 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-204-N8C9rcbUPD-p0VYrydHTuA-1; Tue, 11 Jan 2022 10:36:35 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D8E311006AAA; Tue, 11 Jan 2022 15:36:33 +0000 (UTC) Received: from localhost (unknown [10.39.192.97]) by smtp.corp.redhat.com (Postfix) with ESMTP id 04169B59D6; Tue, 11 Jan 2022 15:36:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641915397; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tGDtKjiipc3yJaBVKz0stKWmMQRDLDakcoWtDbCHGFY=; b=iKJ6MGE40v+Wsme7EzUm2LSFINC4aNWQoBWQkWX/0wa8fzj29XR2z7OCvJW36rvE8Fe1hW UIyANkDJocdjhbmLfYmbpEUXlgTu2VxPzEmX3W5MVq0VQ31j16EsJUSsOkaulSbkr3Xq9y 0zcy2nBEe+9DbSCJPQ0X+SeIZMD6T04= X-MC-Unique: N8C9rcbUPD-p0VYrydHTuA-1 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PATCH v3 2/2] iotests/stream-error-on-reset: New test Date: Tue, 11 Jan 2022 15:36:13 +0000 Message-Id: <20220111153613.25453-3-stefanha@redhat.com> In-Reply-To: <20220111153613.25453-1-stefanha@redhat.com> References: <20220111153613.25453-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=stefanha@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.595, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Vladimir Sementsov-Ogievskiy , qemu-block@nongnu.org, qemu-stable@nongnu.org, Hanna Reitz , Stefan Hajnoczi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1641915697409100001 Content-Type: text/plain; charset="utf-8" From: Hanna Reitz Test the following scenario: - Simple stream block in two-layer backing chain (base and top) - The job is drained via blk_drain(), then an error occurs while the job settles the ongoing request - And so the job completes while in blk_drain() This was reported as a segfault, but is fixed by "block-backend: prevent dangling BDS pointers across aio_poll()". Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=3D2036178 Signed-off-by: Hanna Reitz Signed-off-by: Stefan Hajnoczi --- .../qemu-iotests/tests/stream-error-on-reset | 140 ++++++++++++++++++ .../tests/stream-error-on-reset.out | 5 + 2 files changed, 145 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-error-on-reset create mode 100644 tests/qemu-iotests/tests/stream-error-on-reset.out diff --git a/tests/qemu-iotests/tests/stream-error-on-reset b/tests/qemu-io= tests/tests/stream-error-on-reset new file mode 100755 index 0000000000..7eaedb24d7 --- /dev/null +++ b/tests/qemu-iotests/tests/stream-error-on-reset @@ -0,0 +1,140 @@ +#!/usr/bin/env python3 +# group: rw quick +# +# Test what happens when a stream job completes in a blk_drain(). +# +# Copyright (C) 2022 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +import os +import iotests +from iotests import imgfmt, qemu_img_create, qemu_io_silent, QMPTestCase + + +image_size =3D 1 * 1024 * 1024 +data_size =3D 64 * 1024 +base =3D os.path.join(iotests.test_dir, 'base.img') +top =3D os.path.join(iotests.test_dir, 'top.img') + + +# We want to test completing a stream job in a blk_drain(). +# +# The blk_drain() we are going to use is a virtio-scsi device resetting, +# which we can trigger by resetting the system. +# +# In order to have the block job complete on drain, we (1) throttle its +# base image so we can start the drain after it has begun, but before it +# completes, and (2) make it encounter an I/O error on the ensuing write. +# (If it completes regularly, the completion happens after the drain for +# some reason.) + +class TestStreamErrorOnReset(QMPTestCase): + def setUp(self) -> None: + """ + Create two images: + - base image {base} with {data_size} bytes allocated + - top image {top} without any data allocated + + And the following VM configuration: + - base image throttled to {data_size} + - top image with a blkdebug configuration so the first write access + to it will result in an error + - top image is attached to a virtio-scsi device + """ + assert qemu_img_create('-f', imgfmt, base, str(image_size)) =3D=3D= 0 + assert qemu_io_silent('-c', f'write 0 {data_size}', base) =3D=3D 0 + assert qemu_img_create('-f', imgfmt, top, str(image_size)) =3D=3D 0 + + self.vm =3D iotests.VM() + self.vm.add_args('-accel', 'tcg') # Make throttling work properly + self.vm.add_object(self.vm.qmp_to_opts({ + 'qom-type': 'throttle-group', + 'id': 'thrgr', + 'x-bps-total': str(data_size) + })) + self.vm.add_blockdev(self.vm.qmp_to_opts({ + 'driver': imgfmt, + 'node-name': 'base', + 'file': { + 'driver': 'throttle', + 'throttle-group': 'thrgr', + 'file': { + 'driver': 'file', + 'filename': base + } + } + })) + self.vm.add_blockdev(self.vm.qmp_to_opts({ + 'driver': imgfmt, + 'node-name': 'top', + 'file': { + 'driver': 'blkdebug', + 'node-name': 'top-blkdebug', + 'inject-error': [{ + 'event': 'pwritev', + 'immediately': 'true', + 'once': 'true' + }], + 'image': { + 'driver': 'file', + 'filename': top + } + }, + 'backing': 'base' + })) + self.vm.add_device(self.vm.qmp_to_opts({ + 'driver': 'virtio-scsi', + 'id': 'vscsi' + })) + self.vm.add_device(self.vm.qmp_to_opts({ + 'driver': 'scsi-hd', + 'bus': 'vscsi.0', + 'drive': 'top' + })) + self.vm.launch() + + def tearDown(self) -> None: + self.vm.shutdown() + os.remove(top) + os.remove(base) + + def test_stream_error_on_reset(self) -> None: + # Launch a stream job, which will take at least a second to + # complete, because the base image is throttled (so we can + # get in between it having started and it having completed) + res =3D self.vm.qmp('block-stream', job_id=3D'stream', device=3D't= op') + self.assert_qmp(res, 'return', {}) + + while True: + ev =3D self.vm.event_wait('JOB_STATUS_CHANGE') + if ev['data']['status'] =3D=3D 'running': + # Once the stream job is running, reset the system, which + # forces the virtio-scsi device to be reset, thus draining + # the stream job, and making it complete. Completing + # inside of that drain should not result in a segfault. + res =3D self.vm.qmp('system_reset') + self.assert_qmp(res, 'return', {}) + elif ev['data']['status'] =3D=3D 'null': + # The test is done once the job is gone + break + + +if __name__ =3D=3D '__main__': + # Passes with any format with backing file support, but qed and + # qcow1 do not seem to exercise the used-to-be problematic code + # path, so there is no point in having them in this list + iotests.main(supported_fmts=3D['qcow2', 'vmdk'], + supported_protocols=3D['file']) diff --git a/tests/qemu-iotests/tests/stream-error-on-reset.out b/tests/qem= u-iotests/tests/stream-error-on-reset.out new file mode 100644 index 0000000000..ae1213e6f8 --- /dev/null +++ b/tests/qemu-iotests/tests/stream-error-on-reset.out @@ -0,0 +1,5 @@ +. +---------------------------------------------------------------------- +Ran 1 tests + +OK --=20 2.33.1