From nobody Mon Feb 9 20:20:55 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=dupond.be ARC-Seal: i=1; a=rsa-sha256; t=1770128751; cv=none; d=zohomail.com; s=zohoarc; b=TUTiIGIuu9pK47xp0OXaF2vqksuregAGsy6eaI/T8rWAMVHkFA0bH2ZzK0V8TmjLkN6dvF1ZAYQKji1pAbc+fZv7wxiWutJ8aRrMx34U09Za/R/wm+KDggjAZ0NqlEsOjaaoncVO1GZxGC+H3J15HQuKDX7OXAaqOEv/VcTjDTQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1770128751; h=Content-Type:Cc:Cc:Date:Date:From:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=vsU92RbtNXuYo1Z30VDe1YdfQ0oVBXsTTXhMQKwh2nU=; b=awDsAlYmI9UFeDiTttZvNa9JbaSMkwRu0maeTM/J0zKQbpfuV5XKw0b3s/kFSefErbnCkbGXiz0y9PiR39k9SdFBFvzkAITdLb+PfdhKVAlPRZSaMJIQfBMfebO5TIhpK6D2cBS4fneqCUUaud9r/4y5/pZqjXQzS+wyIbgEddc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 177012875102945.956413118637215; Tue, 3 Feb 2026 06:25:51 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vnHLW-00019N-9v; Tue, 03 Feb 2026 09:25:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vnHLV-000194-4b for qemu-devel@nongnu.org; Tue, 03 Feb 2026 09:25:37 -0500 Received: from apollo.dupie.be ([51.159.20.238]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vnHLS-00040d-Sx for qemu-devel@nongnu.org; Tue, 03 Feb 2026 09:25:36 -0500 Received: from [IPV6:2a00:1c98:fff1:1001:aee7:ee9c:3ae8:78e2] (unknown [IPv6:2a00:1c98:fff1:1001:aee7:ee9c:3ae8:78e2]) by apollo.dupie.be (Postfix) with ESMTPSA id 685EF1520CCD; Tue, 03 Feb 2026 15:25:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dupond.be; s=dkim; t=1770128730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=vsU92RbtNXuYo1Z30VDe1YdfQ0oVBXsTTXhMQKwh2nU=; b=FhnGCZpQmTscX/TPezD+rkybEfoe+tddTTyk6fAMZHBDcqRjXD+6yzG+dGHswdYNy3dbK6 dAfXCLtRjp6+mEA4CyWoo2h0s6cdzyUmWnB6rp2lV26Hk0WVaf+En3TVH552+NhKNWI2PE VZdDo7FK165wtVwy0l1bmKs1GPjs/434Aitd0ke2bIexmUNfYSD/3+pupfuk7ZM1EmMiMy 4kyoWJgxKV4MSGvjKF0Adu1jI6HrcrsJBbzVdFkuVxUuLSx8gf6StxCVDuEwSbTInC2yh4 hpge1Xu+y8zym9rNU1tB07bnujk14KZ6lE3aitcx2XymeJqcc0sMUI+4NOJiOw== Content-Type: multipart/alternative; boundary="------------rNZ0YjKLqMmghUrjyvexC3nz" Message-ID: <4853b0e5-8ec3-41e9-9a53-b1912b8e4449@dupond.be> Date: Tue, 3 Feb 2026 15:25:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: qemu-devel@nongnu.org, f.ebner@proxmox.com, kwolf@redhat.com Content-Language: en-US, en-GB, nl-BE Cc: dionbosschieter@gmail.com From: Jean-Louis Dupond Subject: (in guest) disk corruption during snapshots Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=51.159.20.238; envelope-from=jean-louis@dupond.be; helo=apollo.dupie.be X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @dupond.be) X-ZM-MESSAGEID: 1770128757949158501 This is a multi-part message in MIME format. --------------rNZ0YjKLqMmghUrjyvexC3nz Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format="flowed" Hi, Since some months we were observing disk corruption within the VM when=20 enabling backups (which triggers snapshots). After a lot of troubleshooting, we were able to track down the commit=20 that caused it: https://gitlab.com/qemu-project/qemu/-/commit/058cfca5645a9ed7cb2bdb77d15f2= eacaf343694 More info in the issue: https://gitlab.com/qemu-project/qemu/-/issues/3273 Now this seems to be caused by a race between disabling the=20 dirty_bitmaps and the tracking implemented in the mirror top layer. Kevin shared me a possible solution: diff --git a/block/mirror.c b/block/mirror.c index b344182c747..f76e43f22c1 100644 Reviewed-by: Fiona Ebner --- a/block/mirror.c +++ b/block/mirror.c @@ -1122,6 +1122,9 @@ static int coroutine_fn mirror_run(Job *job, Error **= errp) * accessing it. */ mirror_top_opaque->job =3D s; + if (s->copy_mode !=3D MIRROR_COPY_MODE_WRITE_BLOCKING) { + bdrv_disable_dirty_bitmap(s->dirty_bitmap); + } =20 assert(!s->dbi); s->dbi =3D bdrv_dirty_iter_new(s->dirty_bitmap); @@ -2018,7 +2021,9 @@ static BlockJob *mirror_start_job( * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in = active * mode. */ - bdrv_disable_dirty_bitmap(s->dirty_bitmap); + if (s->copy_mode =3D=3D MIRROR_COPY_MODE_WRITE_BLOCKING) { + bdrv_disable_dirty_bitmap(s->dirty_bitmap); + } =20 bdrv_graph_wrlock_drained(); ret =3D block_job_add_bdrv(&s->common, "source", bs, 0, Running this for some hours already, and it seems to fix the issue. Let's open up the discussion if this is the proper way to fix it, or if=20 there are better alternatives :) Thanks Jean-Louis --------------rNZ0YjKLqMmghUrjyvexC3nz Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi,

Since some months we were observing disk corruption within the VM when enabling backups (which triggers snapshots).
After a lot of troubleshooting, we were able to track down the commit that caused it:
https://gitlab.com/qemu-project/qemu/-/commit/058cfca5645a9ed7cb2bdb77d15f2eacaf343694

More info in the issue:
https://gitlab.com/qemu-project/qemu/-/issues/3273

Now this seems to be caused by a race between disabling the dirty_bitmaps and the tracking implemented in the mirror top layer.
Kevin shared me a possible solution:

diff --git a/block/mirror.c b/block/mirror.c
index b344182c747..f76e43f22c1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1122,6 +1122,9 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
      * accessing it.
      */
     mirror_top_opaque->job = s;
+    if (s->copy_mode != MIRROR_COPY_MODE_WRITE_BLOCKING) {
+        bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+    }
 
     assert(!s->dbi);
     s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
@@ -2018,7 +2021,9 @@ static BlockJob *mirror_start_job(
      * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active
      * mode.
      */
-    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+    if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) {
+        bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+    }
 
     bdrv_graph_wrlock_drained();
     ret = block_job_add_bdrv(&s->common, "source", bs, 0,


Running this for some hours already, and it seems to fix the issue.

Let's open up the discussion if this is the proper way to fix it, or if there are better alternatives :)

Thanks
Jean-Louis

--------------rNZ0YjKLqMmghUrjyvexC3nz--