From nobody Sun Feb  8 21:17:18 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=fail(p=none dis=none)  header.from=redhat.com
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 151931537948018.1098554772027;
 Thu, 22 Feb 2018 08:02:59 -0800 (PST)
Received: from localhost ([::1]:39336 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1eotKf-0006Gz-Nb
	for importer@patchew.org; Thu, 22 Feb 2018 11:02:53 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50192)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHf-00040X-M4
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:48 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHe-0001gn-OI
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:47 -0500
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:35120 helo=mx1.redhat.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <eblake@redhat.com>)
	id 1eotHZ-0001bb-FL; Thu, 22 Feb 2018 10:59:41 -0500
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com
	[10.11.54.3])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 9FF7E40753A2;
	Thu, 22 Feb 2018 15:59:31 +0000 (UTC)
Received: from red.redhat.com (ovpn-122-122.rdu2.redhat.com [10.10.122.122])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 25ECA10A85AF;
	Thu, 22 Feb 2018 15:59:31 +0000 (UTC)
From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Date: Thu, 22 Feb 2018 09:59:19 -0600
Message-Id: <20180222155922.9833-2-eblake@redhat.com>
In-Reply-To: <20180222155922.9833-1-eblake@redhat.com>
References: <20180222155922.9833-1-eblake@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.11.55.7]);
	Thu, 22 Feb 2018 15:59:31 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]);
	Thu, 22 Feb 2018 15:59:31 +0000 (UTC) for IP:'10.11.54.3'
	DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com'
	HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:''
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.187.233.73
Subject: [Qemu-devel] [PATCH v3 1/4] qcow2: Prefer byte-based calls into
 bs->file
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: kwolf@redhat.com, berto@igalia.com, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

We had only three sector-based stragglers left; convert them to use
our preferred byte-based accesses.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>

---
v2: indentation fix
---
 block/qcow2-cluster.c  | 5 ++---
 block/qcow2-refcount.c | 6 +++---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e406b0f3b9e..85be7d5e340 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1615,13 +1615,12 @@ int qcow2_decompress_cluster(BlockDriverState *bs, =
uint64_t cluster_offset)
         }

         BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
-        ret =3D bdrv_read(bs->file, coffset >> 9, s->cluster_data,
-                        nb_csectors);
+        ret =3D bdrv_pread(bs->file, coffset, s->cluster_data, csize);
         if (ret < 0) {
             return ret;
         }
         if (decompress_buffer(s->cluster_cache, s->cluster_size,
-                              s->cluster_data + sector_offset, csize) < 0)=
 {
+                              s->cluster_data, csize) < 0) {
             return -EIO;
         }
         s->cluster_cache_offset =3D coffset;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d46b69d7f34..28afbb1b5ea 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2310,8 +2310,8 @@ write_refblocks:
         on_disk_refblock =3D (void *)((char *) *refcount_table +
                                     refblock_index * s->cluster_size);

-        ret =3D bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
-                         on_disk_refblock, s->cluster_sectors);
+        ret =3D bdrv_pwrite(bs->file, refblock_offset, on_disk_refblock,
+                          s->cluster_size);
         if (ret < 0) {
             fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret)=
);
             goto fail;
@@ -2533,7 +2533,7 @@ fail:
  * - 0 if writing to this offset will not affect the mentioned metadata
  * - a positive QCow2MetadataOverlap value indicating one overlapping sect=
ion
  * - a negative value (-errno) indicating an error while performing a chec=
k,
- *   e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
+ *   e.g. when bdrv_pread failed on QCOW2_OL_INACTIVE_L2
  */
 int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t of=
fset,
                                  int64_t size)
--=20
2.14.3


From nobody Sun Feb  8 21:17:18 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=fail(p=none dis=none)  header.from=redhat.com
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1519315472999386.4828633632085;
 Thu, 22 Feb 2018 08:04:32 -0800 (PST)
Received: from localhost ([::1]:39351 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1eotMG-0007Tv-7o
	for importer@patchew.org; Thu, 22 Feb 2018 11:04:32 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50196)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHf-00040y-VB
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:49 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHe-0001gv-Tf
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:48 -0500
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55198 helo=mx1.redhat.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <eblake@redhat.com>)
	id 1eotHZ-0001bZ-FA; Thu, 22 Feb 2018 10:59:41 -0500
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com
	[10.11.54.3])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 49C188011454;
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC)
Received: from red.redhat.com (ovpn-122-122.rdu2.redhat.com [10.10.122.122])
	by smtp.corp.redhat.com (Postfix) with ESMTP id C52AB10A85AF;
	Thu, 22 Feb 2018 15:59:31 +0000 (UTC)
From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Date: Thu, 22 Feb 2018 09:59:20 -0600
Message-Id: <20180222155922.9833-3-eblake@redhat.com>
In-Reply-To: <20180222155922.9833-1-eblake@redhat.com>
References: <20180222155922.9833-1-eblake@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.11.55.8]);
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]);
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC) for IP:'10.11.54.3'
	DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com'
	HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:''
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.187.233.73
Subject: [Qemu-devel] [PATCH v3 2/4] qcow2: Document some maximum size
 constraints
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: kwolf@redhat.com, berto@igalia.com, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Although off_t permits up to 63 bits (8EB) of file offsets, in
practice, we're going to hit other limits first.  Document some
of those limits in the qcow2 spec, and how choice of cluster size
can influence some of the limits.

While at it, notice that since we cannot map any virtual cluster
to any address higher than 64 PB (56 bits) (due to the L1/L2 field
encoding), it makes little sense to require the refcount table to
access host offsets beyond that point.  Mark the upper bits of
the refcount table entries as reserved, with no ill effects, since
it is unlikely that there are any existing images larger than 64PB
in the first place, and thus all existing images already have those
bits as 0.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 docs/interop/qcow2.txt | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index feb711fb6a8..2417522ca9b 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -40,7 +40,16 @@ The first cluster of a qcow2 image contains the file hea=
der:
                     with larger cluster sizes.

          24 - 31:   size
-                    Virtual disk size in bytes
+                    Virtual disk size in bytes.
+
+                    Note: with a 2 MB cluster size, the maximum
+                    virtual size is 2 EB (61 bits) for a sparse file,
+                    but other sizing limitations mean that an image
+                    cannot have more than 64 PB of populated clusters
+                    (and may hit other sizing limitations as well,
+                    such as underlying protocol limits).  With a 512
+                    byte cluster size, the maximum virtual size drops
+                    to 128 GB (37 bits).

          32 - 35:   crypt_method
                     0 for no encryption
@@ -318,6 +327,11 @@ for each host cluster. A refcount of 0 means that the =
cluster is free, 1 means
 that it is used, and >=3D 2 means that it is used and any write access must
 perform a COW (copy on write) operation.

+The refcount table has implications on the maximum host file size; a
+larger cluster size is required for the refcount table to cover larger
+offsets.  Furthermore, all qcow2 metadata must reside at offsets below
+64 PB (56 bits).
+
 The refcounts are managed in a two-level table. The first level is called
 refcount table and has a variable size (which is stored in the header). The
 refcount table can cover multiple clusters, however it needs to be contigu=
ous
@@ -341,7 +355,7 @@ Refcount table entry:

     Bit  0 -  8:    Reserved (set to 0)

-         9 - 63:    Bits 9-63 of the offset into the image file at which t=
he
+         9 - 55:    Bits 9-55 of the offset into the image file at which t=
he
                     refcount block starts. Must be aligned to a cluster
                     boundary.

@@ -349,6 +363,8 @@ Refcount table entry:
                     been allocated. All refcounts managed by this refcount=
 block
                     are 0.

+        56 - 63:    Reserved (set to 0)
+
 Refcount block entry (x =3D refcount_bits - 1):

     Bit  0 -  x:    Reference count of the cluster. If refcount_bits impli=
es a
@@ -365,6 +381,11 @@ The L1 table has a variable size (stored in the header=
) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
 exactly one cluster in size.

+The L1 and L2 tables have implications on the maximum virtual file
+size; a larger cluster size is required for guest to have access to a
+larger size.  Furthermore, a virtual cluster must map to a host offset
+below 64 PB (56 bits).
+
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows:

@@ -427,7 +448,9 @@ Standard Cluster Descriptor:
 Compressed Clusters Descriptor (x =3D 62 - (cluster_bits - 8)):

     Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
-                    cluster or sector boundary!
+                    cluster or sector boundary!  If cluster_bits is
+                    small enough that this field includes bits beyond
+                    55, those upper bits must be set to 0.

          x - 61:    Number of additional 512-byte sectors used for the
                     compressed data, beyond the sector containing the offs=
et
--=20
2.14.3


From nobody Sun Feb  8 21:17:18 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=fail(p=none dis=none)  header.from=redhat.com
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1519315536307672.7722049382938;
 Thu, 22 Feb 2018 08:05:36 -0800 (PST)
Received: from localhost ([::1]:39352 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1eotNA-0008Ah-A5
	for importer@patchew.org; Thu, 22 Feb 2018 11:05:28 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50270)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHo-00049h-Ey
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:57 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHk-0001kA-Ho
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:56 -0500
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:52894 helo=mx1.redhat.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <eblake@redhat.com>)
	id 1eotHZ-0001ba-FG; Thu, 22 Feb 2018 10:59:41 -0500
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com
	[10.11.54.3])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id EAABA8425B;
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC)
Received: from red.redhat.com (ovpn-122-122.rdu2.redhat.com [10.10.122.122])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 717DD10A85AF;
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC)
From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Date: Thu, 22 Feb 2018 09:59:21 -0600
Message-Id: <20180222155922.9833-4-eblake@redhat.com>
In-Reply-To: <20180222155922.9833-1-eblake@redhat.com>
References: <20180222155922.9833-1-eblake@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.11.55.2]);
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]);
	Thu, 22 Feb 2018 15:59:32 +0000 (UTC) for IP:'10.11.54.3'
	DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com'
	HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:''
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.187.233.73
Subject: [Qemu-devel] [PATCH v3 3/4] qcow2: Don't allow overflow during
 cluster allocation
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: kwolf@redhat.com, berto@igalia.com, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Our code was already checking that we did not attempt to
allocate more clusters than what would fit in an INT64 (the
physical maximimum if we can access a full off_t's worth of
data).  But this does not catch smaller limits enforced by
various spots in the qcow2 image description: L1 and normal
clusters of L2 are documented as having bits 63-56 reserved
for other purposes, capping our maximum offset at 64PB (bit
55 is the maximum bit set).  And for compressed images with
2M clusters, the cap drops the maximum offset to bit 48, or
a maximum offset of 512TB.  If we overflow that offset, we
would write compressed data into one place, but try to
decompress from another, which won't work.

I don't have 512TB handy to prove whether things break if we
compress so much data that we overflow that limit, and don't
think that iotests can (quickly) test it either.  Test 138
comes close (it corrupts an image into thinking something lives
at 32PB, which is half the maximum for L1 sizing - although
it relies on 512-byte clusters).  But that test points out
that we will generally hit other limits first (such as running
out of memory for the refcount table, or exceeding file system
limits like 16TB on ext4, etc), so this is more a theoretical
safety valve than something likely to be hit.

Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>
---
v3: use s->cluster_offset_mask instead of open-coding it [Berto],
use MIN() to clamp offset on small cluster size, add spec patch
first to justify clamping even on refcount allocations
---
 block/qcow2.h          |  6 ++++++
 block/qcow2-refcount.c | 21 ++++++++++++++-------
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 883802241fb..560008c331d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -41,6 +41,12 @@
 #define QCOW_MAX_CRYPT_CLUSTERS 32
 #define QCOW_MAX_SNAPSHOTS 65536

+/* Field widths in qcow2 mean normal cluster offsets cannot reach
+ * 64PB; depending on cluster size, compressed clusters can have a
+ * smaller limit (64PB for up to 16k clusters, then ramps down to
+ * 512TB for 2M clusters).  */
+#define QCOW_MAX_CLUSTER_OFFSET ((1ULL << 56) - 1)
+
 /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
  * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
 #define QCOW_MAX_REFTABLE_SIZE 0x800000
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 28afbb1b5ea..c46a2257a9e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -31,7 +31,8 @@
 #include "qemu/bswap.h"
 #include "qemu/cutils.h"

-static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
+static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size,
+                                    uint64_t max);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length, uint64_t adden=
d,
                             bool decrease, enum qcow2_discard_type type);
@@ -362,7 +363,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
     }

     /* Allocate the refcount block itself and mark it as used */
-    int64_t new_block =3D alloc_clusters_noref(bs, s->cluster_size);
+    int64_t new_block =3D alloc_clusters_noref(bs, s->cluster_size,
+                                             QCOW_MAX_CLUSTER_OFFSET);
     if (new_block < 0) {
         return new_block;
     }
@@ -947,7 +949,8 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,


 /* return < 0 if error */
-static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size)
+static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size,
+                                    uint64_t max)
 {
     BDRVQcow2State *s =3D bs->opaque;
     uint64_t i, nb_clusters, refcount;
@@ -972,9 +975,9 @@ retry:
     }

     /* Make sure that all offsets in the "allocated" range are representab=
le
-     * in an int64_t */
+     * in the requested max */
     if (s->free_cluster_index > 0 &&
-        s->free_cluster_index - 1 > (INT64_MAX >> s->cluster_bits))
+        s->free_cluster_index - 1 > (max >> s->cluster_bits))
     {
         return -EFBIG;
     }
@@ -994,7 +997,7 @@ int64_t qcow2_alloc_clusters(BlockDriverState *bs, uint=
64_t size)

     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC);
     do {
-        offset =3D alloc_clusters_noref(bs, size);
+        offset =3D alloc_clusters_noref(bs, size, QCOW_MAX_CLUSTER_OFFSET);
         if (offset < 0) {
             return offset;
         }
@@ -1076,7 +1079,11 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int =
size)
     free_in_cluster =3D s->cluster_size - offset_into_cluster(s, offset);
     do {
         if (!offset || free_in_cluster < size) {
-            int64_t new_cluster =3D alloc_clusters_noref(bs, s->cluster_si=
ze);
+            int64_t new_cluster;
+
+            new_cluster =3D alloc_clusters_noref(bs, s->cluster_size,
+                                               MIN(s->cluster_offset_mask,
+                                                   QCOW_MAX_CLUSTER_OFFSET=
));
             if (new_cluster < 0) {
                 return new_cluster;
             }
--=20
2.14.3


From nobody Sun Feb  8 21:17:18 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=fail(p=none dis=none)  header.from=redhat.com
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1519315466449430.3644475012462;
 Thu, 22 Feb 2018 08:04:26 -0800 (PST)
Received: from localhost ([::1]:39349 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1eotM9-0007MX-Iv
	for importer@patchew.org; Thu, 22 Feb 2018 11:04:25 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50212)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHg-00041Z-Nd
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:50 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1eotHf-0001hL-8A
	for qemu-devel@nongnu.org; Thu, 22 Feb 2018 10:59:48 -0500
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:38116 helo=mx1.redhat.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <eblake@redhat.com>)
	id 1eotHZ-0001bc-FE; Thu, 22 Feb 2018 10:59:41 -0500
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com
	[10.11.54.3])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 9DC8BEB706;
	Thu, 22 Feb 2018 15:59:33 +0000 (UTC)
Received: from red.redhat.com (ovpn-122-122.rdu2.redhat.com [10.10.122.122])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 1E50910A85AF;
	Thu, 22 Feb 2018 15:59:33 +0000 (UTC)
From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Date: Thu, 22 Feb 2018 09:59:22 -0600
Message-Id: <20180222155922.9833-5-eblake@redhat.com>
In-Reply-To: <20180222155922.9833-1-eblake@redhat.com>
References: <20180222155922.9833-1-eblake@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.11.55.1]);
	Thu, 22 Feb 2018 15:59:33 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]);
	Thu, 22 Feb 2018 15:59:33 +0000 (UTC) for IP:'10.11.54.3'
	DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com'
	HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:''
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.187.233.73
Subject: [Qemu-devel] [PATCH v3 4/4] qcow2: Avoid memory over-allocation on
 compressed images
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: kwolf@redhat.com, berto@igalia.com, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

When reading a compressed image, we were allocating s->cluster_data
to 32*cluster_size + 512 (possibly over 64 megabytes, for an image
with 2M clusters).  Let's check out the history:

Back when qcow2 was first written, we used s->cluster_data for
everything, including copy_sectors() and encryption, where we want
to operate on more than one cluster at once.  Obviously, at that
point, the buffer had to be aligned for other users, even though
compression itself doesn't require any alignment (the fact that
the compressed data generally starts mid-sector means that aligning
our buffer buys us nothing - either the protocol already supports
byte-based access into whatever offset we want, or we are already
using a bounce buffer to read a full sector, and copying into
our destination no longer requires alignment).

But commit 1b9f1491 (v1.1!) changed things to allocate parallel
buffers on demand rather than sharing a single buffer, for encryption
and COW, leaving compression as the final client of s->cluster_data.
That use was still preserved, because if a single compressed cluster
is read more than once, we reuse the cache instead of decompressing
it a second time (someday, we may come up with better caching to
avoid wasting repeated decompressions while still being more parallel,
but that is a task for another patch; the XXX comment in
qcow2_co_preadv for QCOW2_CLUSTER_COMPRESSED is telling).

Much later, in commit de82815d (v2.2), we noticed that a 64M
allocation is prone to failure, so we switched over to a graceful
memory allocation error message.  Elsewhere in the code, we do
g_malloc(2 * cluster_size) without ever checking for failure, but
even 4M starts to be large enough that trying to be nice is worth
the effort, so we want to keep that aspect.

Then even later, in 3e4c7052 (2.11), we realized that allocating
a large buffer up front for every qcow2 image is expensive, and
switched to lazy allocation only for images that actually had
compressed clusters.  But in the process, we never even bothered
to check whether what we were allocating still made sense in its
new context!

So, it's time to cut back on the waste.  A compressed cluster
written by qemu will NEVER occupy more than an uncompressed
cluster, but based on mid-sector alignment, we may still need
to read 1 cluster + 1 sector in order to recover enough bytes
for the decompression.  But third-party producers of qcow2 may
not be as smart, and gzip DOES document that because the
compression stream adds metadata, and because of the pigeonhole
principle, there are worst case scenarios where attempts to
compress will actually inflate an image, by up to 0.015% (or 62
sectors larger for an unfortunate 2M compression).  In fact,
the qcow2 spec permits an all-ones sector count, plus 511 bytes
from the sector containing the initial offset, for a maximum
read of nearly 2 full clusters; and thanks to the way
decompression works, even though such a value is probably too
large for the actual compressed data, it really doesn't matter
if we read too much (gzip ignores slop, once it has decoded a
full cluster).  So it's easier to just allocate cluster_data
to be as large as we can ever possibly see; even if it still
wastes up to 2M on any image created by qcow2, that's still an
improvment of 60M less waste than pre-patch.

Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>
---
v3: tighten allocation [Berto]
v2: actually check allocation failure (previous version meant
to use g_malloc, but ended up posted with g_try_malloc without
checking); add assertions outside of conditional, improve
commit message to better match reality now that qcow2 spec bug
has been fixed
---
 block/qcow2-cluster.c | 27 ++++++++++++++++++---------
 block/qcow2.c         |  2 +-
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 85be7d5e340..af51d71d012 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1598,20 +1598,29 @@ int qcow2_decompress_cluster(BlockDriverState *bs, =
uint64_t cluster_offset)
         sector_offset =3D coffset & 511;
         csize =3D nb_csectors * 512 - sector_offset;

-        /* Allocate buffers on first decompress operation, most images are
-         * uncompressed and the memory overhead can be avoided.  The buffe=
rs
-         * are freed in .bdrv_close().
+        /* Allocate buffers on the first decompress operation; most
+         * images are uncompressed and the memory overhead can be
+         * avoided.  The buffers are freed in .bdrv_close().  qemu
+         * never writes an inflated cluster, and gzip itself never
+         * inflates a problematic cluster by more than 0.015%, but the
+         * qcow2 format allows up to 1 byte short of 2 full clusters
+         * when including the sector containing offset.  gzip ignores
+         * trailing slop, so just allocate that much up front rather
+         * than reject third-party images with overlarge csize.
          */
+        assert(!!s->cluster_data =3D=3D !!s->cluster_cache);
+        assert(csize < 2 * s->cluster_size);
         if (!s->cluster_data) {
-            /* one more sector for decompressed data alignment */
-            s->cluster_data =3D qemu_try_blockalign(bs->file->bs,
-                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
+            s->cluster_data =3D g_try_malloc(2 * s->cluster_size);
             if (!s->cluster_data) {
                 return -ENOMEM;
             }
-        }
-        if (!s->cluster_cache) {
-            s->cluster_cache =3D g_malloc(s->cluster_size);
+            s->cluster_cache =3D g_try_malloc(s->cluster_size);
+            if (!s->cluster_cache) {
+                g_free(s->cluster_data);
+                s->cluster_data =3D NULL;
+                return -ENOMEM;
+            }
         }

         BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
diff --git a/block/qcow2.c b/block/qcow2.c
index 288b5299d80..6ad3436e0e5 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2103,7 +2103,7 @@ static void qcow2_close(BlockDriverState *bs)
     g_free(s->image_backing_format);

     g_free(s->cluster_cache);
-    qemu_vfree(s->cluster_data);
+    g_free(s->cluster_data);
     qcow2_refcount_close(bs);
     qcow2_free_snapshots(bs);
 }
--=20
2.14.3