From nobody Wed May 22 00:17:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; arc=pass (i=1dmarc=pass fromdomain=huayun.com) ARC-Seal: i=2; a=rsa-sha256; t=1612178943; cv=pass; d=zohomail.com; s=zohoarc; b=MgQ9YOb1Su3l4H8+dUbuwoe+H04wa5Y54SO3oFAXQjn/R7pQKfihoFISq1Z7fmMkPHnS5Y6zGG6xwk0xTYJcyzKXymD3+ee9OlS3AmL9TEwGHICHvjTbeTrF4nsPsbHcQWCiMdr/5jnibLTvn7nEeFl5KA7jc5qgpKqyLYeZ8lQ= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1612178943; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=WqGVHmnPXmuTe4uusKE7WJX6bAFbzR4Es03bZoRKa2c=; b=SVGjXFDZRdK0wfIcIXlc40jQe0bVinC3YrHLjo3Njh22QvGX2voA38gW9bZ24X2khIF5W7ytO7HGclLz0fQjvuMXffcuLgHnSZenal539UClaVrLa91uO5bozhBzkmBx3lVTaJetsr2fHFnGw1uaCqxsXMnt/qyeAYoiRclZ26U= ARC-Authentication-Results: i=2; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; arc=pass (i=1dmarc=pass fromdomain=huayun.com) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1612178943457534.9695576686689; Mon, 1 Feb 2021 03:29:03 -0800 (PST) Received: from localhost ([::1]:51174 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l6XOA-0005QO-DY for importer@patchew.org; Mon, 01 Feb 2021 06:29:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47162) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l6XM2-000398-7g; Mon, 01 Feb 2021 06:26:50 -0500 Received: from mail-shaon0153.outbound.protection.partner.outlook.cn ([42.159.164.153]:54728 helo=CN01-SHA-obe.outbound.protection.partner.outlook.cn) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l6XLw-00075J-5O; Mon, 01 Feb 2021 06:26:49 -0500 Received: from BJXPR01MB0776.CHNPR01.prod.partner.outlook.cn (10.43.36.76) by BJXPR01MB056.CHNPR01.prod.partner.outlook.cn (10.41.51.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3805.20; Mon, 1 Feb 2021 11:26:23 +0000 Received: from BJXPR01MB0776.CHNPR01.prod.partner.outlook.cn ([10.43.36.76]) by BJXPR01MB0776.CHNPR01.prod.partner.outlook.cn ([10.43.36.76]) with mapi id 15.20.3805.027; Mon, 1 Feb 2021 11:26:23 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lkVfQg8VuoEVDqv5RlABcORCtYI9ujOrhpt4AsKt2+9XeZ+NosZIRLowo5O4XB8o8LgJRlv3s3dvQ+4puB57jBDJExZQL4ZGHguLY1buCsn6NKmKywi8eWtD5E/BUj7D6x/CvGTwzBN8eCtp8c0bF71bzCey8Sx/o/0bxW0wx+xyrRuDiEmsi5+/h00vTXavS2dV4cZi3jR6bHB0gRRI2b+nUsY51k2LPWWCaS0Dvi7sU0i2cuSJezLVjwN+8CPYTULCCgjbH3NSy8EcdEyjBxROLDpsWRY/y3a5O1de42BeJ9f7bEbmD685Zt22Gv1/RUjk6Y9T8L/zjwcZ7sfXCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WqGVHmnPXmuTe4uusKE7WJX6bAFbzR4Es03bZoRKa2c=; b=MV5LSG91EBeGIuohJ/HYo/Rwa9R5Gb+8kVYjhmu8b2KZKPwZ0uQl2VibT0ND7kDvgP+71QBLXNVgDwgcnASAO679wVykqfm2oUR2gAVRJhIYVihmkkQkRv1sJYnSgHxj/FPCchOUri5KkkEvx4DIpbNrVcf3J0O8dfx961s2LjfonF9na7ovnbRBCF9uVTurDY/NjEMR9dfewdb5txFK1x6V33ooq8FuOjQChYLK+82cXUvwII3CRt8+wdiP22GQ+CucLAC/SEPm0Y8NjE5uWTp+LNhLW5gflTmuNNNAnbKs9ag8DX07+SGTU3oci/wp9JVbrNoLH0lTUUP22rslCQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=huayun.com; dmarc=pass action=none header.from=huayun.com; dkim=pass header.d=huayun.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hycloud.partner.onmschina.cn; s=selector1-hycloud-partner-onmschina-cn; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WqGVHmnPXmuTe4uusKE7WJX6bAFbzR4Es03bZoRKa2c=; b=Tf78SfsGWW0mxyPCn2aGBWHUArNm6y/ssjfQWedfQMUdXgWAwns+/Zt4IcBLmaKwI1/k85kh687uuOFcJx/lKfFUKVGCe8ZWbI7b9SmrAkuSUmNnTZmOR0f/6kYyW4++tezvaOrJqlonPoV5j8Ph7KxxbWkrq8ye4vfXkGEfjKk= From: =?utf-8?B?5LuH5aSn546J?= To: Vladimir Sementsov-Ogievskiy , "08005325@163.com" <08005325@163.com>, "kwolf@redhat.com" , "mreitz@redhat.com" , "jsnow@redhat.com" Subject: RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot Thread-Topic: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot Thread-Index: AQHW9RU6HzbrvThE906I7emhJunSaKpDH5WAgAALCQA= Date: Mon, 1 Feb 2021 11:26:23 +0000 Message-ID: References: <20210126032545.13349-1-08005325@163.com> <20210128013053.23266-1-08005325@163.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: virtuozzo.com; dkim=none (message not signed) header.d=none;virtuozzo.com; dmarc=none action=none header.from=huayun.com; x-originating-ip: [218.90.171.226] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 31f65570-d66d-4b49-e936-08d8c6a434b3 x-ms-traffictypediagnostic: BJXPR01MB056: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:5797; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 7/EFzpktMi4G2E9ANJtfcISTRLFKeZLTxSxsyYSo07bLmeTUSfY/ndSSQ5VJs49o+ttzuEZ3/U8pVr93KUGjQ0Kzo6ksSOqTorWroO0SLbxGVEg11s2m84yEczMdvl8hiTwd17Zvf2TQlqx47a6YOnrpMO9KncOnGY5h6Y8s+j4kn6g44CTUHhuvpOSbErmqCj/NvEPD9mUW+aK754clkj8rNy9MldQwA99Q9RanMvUBEasFJWylwecnFsk3PRETO/ID45oiZRpuXUnxwLtObg1hhsB8PqY5TqH9YLLoFRctV+aC1SGM/EPBMDDtU1XNMmST8LgSyyFw4ZTomfjtHO9Ddo50vbDrsGz9WyRWqh5ejCTEQKysSdkIRCIpYW8TNFs3Xw1nAFD80l6FZICBY6HfdlHnoAp5rCyNbm5ScUxyKMk32+3z/Vg2pk7svdJC2/liPWLmtUD3AMMntXFFWLiOsSXhGR5EiQw8P8zZu9kFK6hpswfdLnMZxUyg7rWUN2JONrdSb0A7++PJln6a1Uma5luzsyVT4ZhAt5tHVeHcW2S03VGbFq1Xna5ed059rEc4YoA7oyEDZOlBd2MP76z0nwmbbGglARh2Zz2Qzis= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BJXPR01MB0776.CHNPR01.prod.partner.outlook.cn; PTR:; CAT:NONE; SFS:(346002)(376002)(366004)(328002)(329002)(71200400001)(66556008)(966005)(4326008)(9686003)(5660300002)(66476007)(33656002)(86362001)(45080400002)(2906002)(8936002)(508600001)(54906003)(55016002)(95416001)(85182001)(186003)(8676002)(26005)(59450400001)(53546011)(63696004)(83380400001)(64756008)(110136005)(7696005)(66946007)(66446008)(76116006)(21314003); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: =?utf-8?B?RlhQRy9leFljOHVvNjhwSXowekF0V1BVTFcxK3M3QUwwb3VHZjhlSStvQWpu?= =?utf-8?B?SXo2MEwxVHRGQjdORnNtY21wWGIyNmI0bUVXTDd3SU9ONWZtdXd6b0xwT0NW?= =?utf-8?B?ak5EVi9WdCsrUWorRlFObSs5bkJaQVcreU5nT1pSR3ArNVR1YkNrM0RKU0Js?= =?utf-8?B?QUFFcTRKNnZuRFBnaFA3Q3IyYjlaVG5SMExjZnhpWDhOanZPMW9jMXlHV0dQ?= =?utf-8?B?ZWpWS2NuV2pIS05pMmVJQnpHVUtQUkZNUGI3dlBpS2l6TGpOWkpGZ3lhMDdE?= =?utf-8?B?eS92WnRBaDJBeE1GRWNYMlpzc3FCZTVZM3lYdE92ZkFOOExXWlBZeStPUWl5?= =?utf-8?B?TE1nSWFCWEtWaWlZc2JaTWo1T0ROY1JkeFpLamdIZDhpR2I5VVc3N2FVN2ZP?= =?utf-8?B?WHdFODZlZTM0SlB4VlFsTE1VUXNRbzZnU1pBNGZVVXpKajV6Y0oyR0o5TEZr?= =?utf-8?B?dUQ1WUJGQ2Jrc0NFc0RMNEttWEdNdk53dXVrVUdCSERiVjBWb3RyZVA2eUNU?= =?utf-8?B?T09WRHBiM2pGOWt5eW5OeXArbHNSamxLa0VnUW16Q0VncjRlRTVQV29XY0xW?= =?utf-8?B?RU9pLytCaU1JMnVISFhrMTRsZGlBMzFRbElwUGhGSTJTb3lGaTVYZy8zU2pC?= =?utf-8?B?UlJSU0w5c0taRThnZ2lEeDc1UmNVUVVIR25veFdtZXorWnpuVUl4djh0dG1o?= =?utf-8?B?d0J0bnp0akhWRzNZTkNiRWREbHUrbVdqMlBmNU13S1lRYWJ2Z1hYSmhhbmEy?= =?utf-8?B?OVJoV2FQYW4xYnhHWGZod053WXRoZ0pVelJFUTRnNktQbDlZMjFFWEJLbG9y?= =?utf-8?B?T2hhcEJWb0RucmE2VlZMM1c5VXJQTTduWHU3cERBZ1BjUXhvYzBwMXZYQkJG?= =?utf-8?B?SHUxTGlQRUs0dkYxdDhUc3pLdW9YZUZGNDRyMHFiWDRqazA2TzFCZ2JUbmQ4?= =?utf-8?B?dmJlN3dnT3ZVY29MV0U0T3I0ZHJZWnJRMzloMkZWZDlxcGpPVm1OaTFabzRT?= =?utf-8?B?TU80djlYUmgrSlhIdHhJNkR5VjJ6aU5pQkpURzRPQVphSElTYlgvdUY4YitG?= =?utf-8?B?dHVOWTAxWUIxcDN3citobjNWM3ZJdGVFQ2pTM3dVY2Q2V1B2ZWt0cTY5VWox?= =?utf-8?B?QXJTZU90ZlhiekJqd3RkbTUvM1Nac08vVTlLN29rVjd5Mkh4TDBxRFduWkw2?= =?utf-8?B?TklEUzBYb3ZoVDZPaUlFaFgxZTRWdWh5UUlETlZHNEZvK0xnUW9yb2I3Q1Qr?= =?utf-8?B?cXJQZ0s1UGt1R0hkQ3lYd3RXVFNUWDBFMzlzbUFtTFRCWE5ZaUtOQjdPNDdq?= =?utf-8?Q?BgHxSlDpsq9LI=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: huayun.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BJXPR01MB0776.CHNPR01.prod.partner.outlook.cn X-MS-Exchange-CrossTenant-Network-Message-Id: 31f65570-d66d-4b49-e936-08d8c6a434b3 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Feb 2021 11:26:23.2228 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a674a363-98d5-4f2d-95da-d54302c8edaa X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: bwLGgQJQeVAI0aW2d5nq5rYi+FRrRms2WRp3l38/KqG37u+oP5dEUfTZCTv00CokE4yYozETjLAxdJRBnOgmfg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BJXPR01MB056 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=42.159.164.153; envelope-from=qiudayu@huayun.com; helo=CN01-SHA-obe.outbound.protection.partner.outlook.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "qemu-devel@nongnu.org" , "qemu-block@nongnu.org" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @hycloud.partner.onmschina.cn) I'm so sorry, forgive my mail client(outlook) I have try your solution, It doesn't work, still cause crash. The reason is: we come to bdrv_mirror_top_pwritev() (which means that mirr= or-top node is already inserted into block graph), but its bs->opaque->job = is not initialized" But the root cause is that in block_job_create() we released(unnecessary) t= he aio_context, and the iothread get the context. Script has to part, one is run in the VM (to give some workload) we named s= cript A: #!/bin/sh For((i=3D1;i<=3D100000000;i++)); Do dd if=3D/dev/zero of=3D./xxx bs=3D1M count=3D200 sleep 6 done Another one is in the hypervisor, we named script B: #!/bin/sh for((i=3D1;i<=3D10000000;i++)); do virsh snapshot-create-as fqtest --atomic --no-metadata --name fq6 --disk-on= ly --diskspec vda,snapshot=3Dexternal,file=3D/home/michael/snapshot/fq6.qc= ow2; virsh blockcommit fqtest /home/michael/snapshot/fq6.qcow2 --shallow --verbo= se --wait --pivot --top /home/michael/snapshot/fq6.qcow2; rm -r fq6.qcow2 done How to reproduce: 1. start a VM, my case is use libvirt, named fqtest 2. run script B in hypervisor 3. after guest boot up, login and run script A in vda. Make sure, the IO thread enabled for vda. Mostly, just wait for several minutes, it will crash.=20 The whole thread backtrace is: [Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437 1437 ../block/mirror.c: No such file or directory. (gdb) p s->job $17 =3D (MirrorBlockJob *) 0x0 (gdb) p s->stop $18 =3D false (gdb) bt #0 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437 #1 0x00005576d0f7f3ab in bdrv_driver_pwritev at ../block/io.c:1174 #2 0x00005576d0f8139d in bdrv_aligned_pwritev at ../block/io.c:1988 #3 0x00005576d0f81b65 in bdrv_co_pwritev_part at ../block/io.c:2156 #4 0x00005576d0f8e6b7 in blk_do_pwritev_part at ../block/block-backend.c:1= 260 #5 0x00005576d0f8e84d in blk_aio_write_entry at ../block/block-backend.c:1= 476 #6 0x00005576d1060ddb in coroutine_trampoline at ../util/coroutine-ucontex= t.c:173 #7 0x00007f7c8d3be0d0 in __start_context at /lib/../lib64/libc.so.6 #8 0x00007f7b52beb1e0 in () #9 0x0000000000000000 in () Switch to qemu main thread: #0 0x00007f903be704ed in __lll_lock_wait at /lib/../lib64/libpthread.so.0 #1 0x00007f903be6bde6 in _L_lock_941 at /lib/../lib64/libpthread.so.0 #2 0x00007f903be6bcdf in pthread_mutex_lock at /lib/../lib64/libpthread.so.0 #3 0x0000564b21456889 in qemu_mutex_lock_impl at ../util/qemu-thread-posix.c:79 #4 0x0000564b213af8a5 in block_job_add_bdrv at ../blockjob.c:224 #5 0x0000564b213b00ad in block_job_create at ../blockjob.c:440 #6 0x0000564b21357c0a in mirror_start_job at ../block/mirror.c:1622 #7 0x0000564b2135a9af in commit_active_start at ../block/mirror.c:1867 #8 0x0000564b2133d132 in qmp_block_commit at ../blockdev.c:2768 #9 0x0000564b2141fef3 in qmp_marshal_block_commit at qapi/qapi-commands-block-core.c:346 #10 0x0000564b214503c9 in do_qmp_dispatch_bh at ../qapi/qmp-dispatch.c:110 #11 0x0000564b21451996 in aio_bh_poll at ../util/async.c:164 #12 0x0000564b2146018e in aio_dispatch at ../util/aio-posix.c:381 #13 0x0000564b2145187e in aio_ctx_dispatch at ../util/async.c:306 #14 0x00007f9040239049 in g_main_context_dispatch at /lib/../lib64/libglib-2.0.so.0 #15 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:232 #16 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:255 #17 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:531 #18 0x0000564b212304e1 in qemu_main_loop at ../softmmu/runstate.c:721 #19 0x0000564b20f7975e in main at ../softmmu/main.c:50 Thanks, Michael -----Original Message----- From: Vladimir Sementsov-Ogievskiy =20 Sent: 2021=E5=B9=B42=E6=9C=881=E6=97=A5 18:28 To: 08005325@163.com; kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com Cc: =E4=BB=87=E5=A4=A7=E7=8E=89 ; qemu-devel@nongnu.org= ; qemu-block@nongnu.org Subject: Re: [PATCH v4] blockjob: Fix crash with IOthread when block commit= after snapshot Hi! Tanks for fixing and sorry for a delay! Please send each new version of a patch as a separate branch. It's a rule f= rom https://wiki.qemu.org/Contribute/SubmitAPatch and it is more readable a= nd less probable that your patch will be missed. 28.01.2021 04:30, 08005325@163.com wrote: > From: Michael Qiu >=20 > v4: rebase to latest code >=20 > v3: reformat the commit log, remove duplicate content >=20 > v2: modify the coredump backtrace within commit log with the newest > qemu with master branch Such things shouldn't be in a commit message. You may put such comments aft= er --- line[*] in a generated patch email >=20 > Currently, if guest has workloads, IO thread will acquire aio_context=20 > lock before do io_submit, it leads to segmentfault when do block=20 > commit after snapshot. Just like below: Do you have some reproducer script? >=20 > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in=20 > bdrv_mirror_top_pwritev at ../block/mirror.c:1437 > 1437 ../block/mirror.c: No such file or directory. > (gdb) p s->job > $17 =3D (MirrorBlockJob *) 0x0 > (gdb) p s->stop > $18 =3D false >=20 > (gdb) bt >=20 > Switch to qemu main thread: > /lib/../lib64/libpthread.so.0 > /lib/../lib64/libpthread.so.0 > ../util/qemu-thread-posix.c:79 > qapi/qapi-commands-block-core.c:346 > ../qapi/qmp-dispatch.c:110 > /lib/../lib64/libglib-2.0.so.0 Not very informative bt.. >=20 > In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and=20 > stop field is false, this means the MirrorBDSOpaque "s" object has not=20 > been initialized yet, and this object is initialized by=20 > block_job_create(), but the initialize process is stuck in acquiring the = lock. Could you show another thread bt? Hm, so you argue that we come to bdrv_mirror_top_pwritev() (which means tha= t mirror-top node is already inserted into block graph), but its bs->opaque= is not initialized? Hmm, really in mirror_start_job we do insert mirror_top_bs before block_job= _create() call. But we should do that all in a drained section, so that no parallel io requ= ests may come. And we have a drained section but it finishes immediately after bdrv_append= , when bs_opaque is still not initialized. Probably we just need to expand = it? May be: diff --git a/block/mirror.c b/block/mirror.c index 8e1ad6eceb..0a6bfc1230 1= 00644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1610,11 +1610,11 @@ static BlockJob *mirror_start_job( bdrv_ref(mirror_top_bs); bdrv_drained_begin(bs); bdrv_append(mirror_top_bs, bs, &local_err); - bdrv_drained_end(bs); =20 if (local_err) { bdrv_unref(mirror_top_bs); error_propagate(errp, local_err); + bdrv_drained_end(bs); return NULL; } =20 @@ -1789,6 +1789,8 @@ static BlockJob *mirror_start_job( trace_mirror_start(bs, s, opaque); job_start(&s->common.job); =20 + bdrv_drained_end(bs); + return &s->common; =20 fail: @@ -1813,6 +1815,8 @@ fail: =20 bdrv_unref(mirror_top_bs); =20 + bdrv_drained_end(bs); + return NULL; } =20 Could you check, does it help? >=20 > The rootcause is that qemu do release/acquire when hold the lock, at=20 > the same time, IO thread get the lock after release stage, and the=20 > crash occured. >=20 > Actually, in this situation, job->job.aio_context will not equal to=20 > qemu_get_aio_context(), and will be the same as bs->aio_context, thus,=20 > no need to release the lock, becasue bdrv_root_attach_child() will not=20 > change the context. >=20 > This patch fix this issue. >=20 > Signed-off-by: Michael Qiu > --- [*] here you could add any comments, which will not go into final commit me= ssage, like version history. > blockjob.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) >=20 > diff --git a/blockjob.c b/blockjob.c > index 98ac8af982..51a09f3b60 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *n= ame, BlockDriverState *bs, > BdrvChild *c; > =20 > bdrv_ref(bs); > - if (job->job.aio_context !=3D qemu_get_aio_context()) { > + if (bdrv_get_aio_context(bs) !=3D job->job.aio_context && > + job->job.aio_context !=3D qemu_get_aio_context()) { > aio_context_release(job->job.aio_context); > } > c =3D bdrv_root_attach_child(bs, name, &child_job, 0, > job->job.aio_context, perm, shared_perm,= job, > errp); > - if (job->job.aio_context !=3D qemu_get_aio_context()) { > + if (bdrv_get_aio_context(bs) !=3D job->job.aio_context && > + job->job.aio_context !=3D qemu_get_aio_context()) { that's a wrong check, it will never reacquire the lock on success path, as = after successful attach, bs context would definitely equal to job context. I think you need a boolean variable at start of function, initialized to th= e condition, and after _attach_child() you not recheck the condition but re= ly on variable. > aio_context_acquire(job->job.aio_context); > } > if (c =3D=3D NULL) { >=20 The code was introduced by Kevin in 132ada80c4a6 "block: Adjust AioContexts= when attaching nodes", so I think we need his opinion. You also may add "Fixes: 132ada80c4a6fea7b67e8bb0a5fd299994d927c6", especia= lly if you check that your case doesn't fail before this commit. I think the idea itself is correct, as bdrv_root_attach_child will not call= any of *_set_aio_*, and no reason to release the lock. So it shouldn't hur= t and it's great if it fixes some crash. When side effect of a function is temporary releasing some lock, it's hard = to be sure that all callers are OK with it... -- Best regards, Vladimir