From nobody Mon Feb 9 13:58:35 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 207.211.31.81 as permitted sender) client-ip=207.211.31.81; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-1.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 207.211.31.81 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1575623525; cv=none; d=zohomail.com; s=zohoarc; b=JbPIZr7AGNZxbdMrzVI45vly5sGoArez5eiMIVzGFyM/GakyRzWkazk1BIBbwGwjRACTHg2WZHkMJ0vefeCspIx6wtf3V70Pcc92lYeOD66w5XFUh1sZMvBr3pEBqgk7xX2IKxaG8ibh4zvBztm+JOuPAEHLQp6OYZZsCh+brK0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1575623525; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=qyBnN6uXEfya/MOH7qcbGQVySV1aW/tAG68wGNRPr8g=; b=gcMe7r+bA4bR8s8wZ5DiUCsy6JEBI9pS3oH9aMLe4vDFp3SKjJMcGR+d3p1QC8A7VRpH5E/gpQcxbW9ADbWC79jZlW0S3+edPXdDxkWIMGrRUIxHHNq89YHULadX3zhWR4n+8GAsZppEB1jCEgsz2zHhoNp0vWHuWzGu1Ja5CuM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 207.211.31.81 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by mx.zohomail.com with SMTPS id 1575623525458829.8119455373266; Fri, 6 Dec 2019 01:12:05 -0800 (PST) Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-105-6sF85EjoOD-M5cNle_ZsMg-1; Fri, 06 Dec 2019 04:12:02 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 808D7190D341; Fri, 6 Dec 2019 09:11:57 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5A5EC19757; Fri, 6 Dec 2019 09:11:57 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id DEC1A18089D0; Fri, 6 Dec 2019 09:11:56 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id xB69BaCq025262 for ; Fri, 6 Dec 2019 04:11:36 -0500 Received: by smtp.corp.redhat.com (Postfix) id D0582694A4; Fri, 6 Dec 2019 09:11:36 +0000 (UTC) Received: from localhost.localdomain.com (unknown [10.43.2.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 318D767E52; Fri, 6 Dec 2019 09:11:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575623524; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=qyBnN6uXEfya/MOH7qcbGQVySV1aW/tAG68wGNRPr8g=; b=NpjLLib2vxfRQ7YGIh3b6wU8Kj86ElFMGmGLH80yT5OoNSjXQLLvPLu2FQKJO2bE8hqt98 3ZbtchqQkF014BihuRm2vxjgk7DISg/tKQ2uY/JKI+yi+2UlPgCARY2oHE/7T8sPccZlkx 9SYzc2dcYO+hjz665KK1W+2wWkG+waQ= From: Pavel Mores To: libvir-list@redhat.com Date: Fri, 6 Dec 2019 10:11:29 +0100 Message-Id: <20191206091129.16909-4-pmores@redhat.com> In-Reply-To: <20191206091129.16909-1-pmores@redhat.com> References: <20191206091129.16909-1-pmores@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-loop: libvir-list@redhat.com Cc: Pavel Mores Subject: [libvirt] [PATCH 3/3] qemu: fix concurrency crash bug in snapshot revert X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 6sF85EjoOD-M5cNle_ZsMg-1 X-Mimecast-Spam-Score: 0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) Content-Type: text/plain; charset="utf-8" Although qemuDomainRevertToSnapshot() correctly begins a job at the start, the job is implicitly ended if qemuProcessStop() is called because it lives in the QEMU driver's private data that is purged during qemuProcessStop(). This commit restores the job by calling qemuProcessBeginJob() after qemuProcessStop() invocations. The first chunk is meant to fix a reported bug (see below). The bug's reproducibility on my machine was initially way lower than the reported 50% (more like 5%) but could be boosted to pretty much 100% by having virt-manager connected, displaying the testing domain in viewer. With the fix, the bug cannot be reproduced on my machine under any scenario I could think of. The actual crash was due to the thread performing revert which, not acquiring a job properly, was able to change the QEMU monitor structure under the thread performing snapshot delete while it was waiting on a condition variable. An interesting question is whether this commit takes the right approach to the fix. In particular, qemuProcessStop() arguably should not clear jobs, in which case the right thing to do would be to fix qemuProcessStop(). However, qemuProcessStop() has about twenty callers, and while none of them seems vulnerable to the kind of problems caused by clearing jobs (unlike qemuDomainRevertToSnapshot() who starts QEMU again right after stopping it), some of them might rely on jobs being cleared (this is not always easy to check conclusively). All in all, this commit's solution seems to be the least bad fix which doesn't require a potentially risky refactor. https://bugzilla.redhat.com/show_bug.cgi?id=3D1610207 Signed-off-by: Pavel Mores --- src/qemu/qemu_driver.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 18bd0101e7..b3769cc479 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16637,9 +16637,15 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr sn= apshot, VIR_DOMAIN_EVENT_STOP= PED, detail); virObjectEventStateQueue(driver->domainEventState, eve= nt); - /* Start after stop won't be an async start job, so - * reset to none */ - jobType =3D QEMU_ASYNC_JOB_NONE; + /* We have to begin a new job as the original one (beg= un + * near the top of this function) was lost during the = purge + * of private data in qemuProcessStop() above. + */ + if (qemuProcessBeginJob(driver, vm, + VIR_DOMAIN_JOB_OPERATION_SNAPS= HOT_REVERT, + flags) < 0) { + goto cleanup; + } goto load; } } @@ -16774,6 +16780,11 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr sn= apshot, event =3D virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, detail); + if (qemuProcessBeginJob(driver, vm, + VIR_DOMAIN_JOB_OPERATION_SNAPSHOT_REVE= RT, + flags) < 0) { + goto cleanup; + } } =20 if (qemuDomainSnapshotRevertInactive(driver, vm, snap) < 0) { --=20 2.21.0 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list