[Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints

Eric Blake posted 5 patches 7 years, 7 months ago
[Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints
Posted by Eric Blake 7 years, 7 months ago
Although off_t permits up to 63 bits (8EB) of file offsets, in
practice, we're going to hit other limits first.  Document some
of those limits in the qcow2 spec, and how choice of cluster size
can influence some of the limits.

While at it, notice that since we cannot map any virtual cluster
to any address higher than 64 PB (56 bits) (due to the current L1/L2
field encoding stopping at bit 55), it makes little sense to require
the refcount table to access host offsets beyond that point.  Mark
the upper bits of the refcount table entries as reserved to match
the L1/L2 table, with no ill effects, since it is unlikely that there
are any existing images larger than 64PB in the first place, and thus
all existing images already have those bits as 0.  If 64PB proves to
be too small in the future, we could enlarge all three uses of bit
55 into the reserved bits at that time.

However, there is one limit that reserved bits don't help with: for
compressed clusters, the L2 layout requires an ever-smaller maximum
host offset as cluster size gets larger, down to a 512 TB maximum
with 2M clusters.

Signed-off-by: Eric Blake <eblake@redhat.com>

--
v4: more wording tweaks
v3: new patch
---
 docs/interop/qcow2.txt | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index feb711fb6a8..e32d391e66b 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -40,7 +40,17 @@ The first cluster of a qcow2 image contains the file header:
                     with larger cluster sizes.

          24 - 31:   size
-                    Virtual disk size in bytes
+                    Virtual disk size in bytes.
+
+                    Note: with a 2 MB cluster size, the maximum
+                    virtual size is 2 EB (61 bits) for a sparse file,
+                    but other sizing limitations in refcount and L1/L2
+                    tables mean that an image cannot have more than 64
+                    PB of populated clusters (and an image may hit
+                    other sizing limitations as well, such as
+                    underlying protocol limits).  With a 512 byte
+                    cluster size, the maximum virtual size drops to
+                    128 GB (37 bits).

          32 - 35:   crypt_method
                     0 for no encryption
@@ -318,6 +328,13 @@ for each host cluster. A refcount of 0 means that the cluster is free, 1 means
 that it is used, and >= 2 means that it is used and any write access must
 perform a COW (copy on write) operation.

+The refcount table has implications on the maximum host file size; a
+larger cluster size is required for the refcount table to cover larger
+offsets.  Furthermore, all qcow2 metadata must currently reside at
+offsets below 64 PB (56 bits) (this limit could be enlarged by putting
+reserved bits into use, but only if a similar limit on L1/L2 tables is
+revisited at the same time).
+
 The refcounts are managed in a two-level table. The first level is called
 refcount table and has a variable size (which is stored in the header). The
 refcount table can cover multiple clusters, however it needs to be contiguous
@@ -341,7 +358,7 @@ Refcount table entry:

     Bit  0 -  8:    Reserved (set to 0)

-         9 - 63:    Bits 9-63 of the offset into the image file at which the
+         9 - 55:    Bits 9-55 of the offset into the image file at which the
                     refcount block starts. Must be aligned to a cluster
                     boundary.

@@ -349,6 +366,8 @@ Refcount table entry:
                     been allocated. All refcounts managed by this refcount block
                     are 0.

+        56 - 63:    Reserved (set to 0)
+
 Refcount block entry (x = refcount_bits - 1):

     Bit  0 -  x:    Reference count of the cluster. If refcount_bits implies a
@@ -365,6 +384,17 @@ The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
 exactly one cluster in size.

+The L1 and L2 tables have implications on the maximum virtual file
+size; a larger cluster size is required for the guest to have access
+to more space.  Furthermore, a virtual cluster must currently map to a
+host offset below 64 PB (56 bits) (this limit could be enlarged by
+putting reserved bits into use, but only if a similar limit on
+refcount tables is revisited at the same time).  Additionally, with
+larger cluster sizes, compressed clusters have a smaller limit on host
+cluster mappings (a 2M cluster size requires compressed clusters to
+reside below 512 TB (49 bits), where enlarging this would require an
+incompatible layout change).
+
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows:

@@ -427,7 +457,9 @@ Standard Cluster Descriptor:
 Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):

     Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
-                    cluster or sector boundary!
+                    cluster or sector boundary!  If cluster_bits is
+                    small enough that this field includes bits beyond
+                    55, those upper bits must be set to 0.

          x - 61:    Number of additional 512-byte sectors used for the
                     compressed data, beyond the sector containing the offset
-- 
2.14.3


Re: [Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints
Posted by Alberto Garcia 7 years, 7 months ago
On Tue 27 Feb 2018 05:29:41 PM CET, Eric Blake wrote:
> +The refcount table has implications on the maximum host file size; a
> +larger cluster size is required for the refcount table to cover
> larger +offsets.

Why is this? Because of the refcount_table_clusters field ?

I think the maximum offset allowed by that is ridiculously high,
exceeding any other limit imposed by the L1/L2 tables.

If my numbers are right, with the default values that's 64 ZB.

In addition to that, the size that can be covered by the refcount table
also depends on the size of refcount entries (refcount_order).

With 512 byte clusters and 64 bit refcount entries I still get 8 PB, way
over what's limited by the L1/L2 tables (128 GB).

Berto

Re: [Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints
Posted by Eric Blake 7 years, 7 months ago
On 02/28/2018 04:26 AM, Alberto Garcia wrote:
> On Tue 27 Feb 2018 05:29:41 PM CET, Eric Blake wrote:
>> +The refcount table has implications on the maximum host file size; a
>> +larger cluster size is required for the refcount table to cover
>> larger +offsets.
> 
> Why is this? Because of the refcount_table_clusters field ?
> 
> I think the maximum offset allowed by that is ridiculously high,
> exceeding any other limit imposed by the L1/L2 tables.

Good point.  I was basing my comment off of qcow2.h:

/* 8 MB refcount table is enough for 2 PB images at 64k cluster size
  * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
#define QCOW_MAX_REFTABLE_SIZE 0x800000

But that's our implementation choice (we put a maximum amount of memory 
on the size of the refcount table we are willing to support, while the 
qcow2 spec would allow an implementation willing to reserve more memory 
to access even larger sizing).

> 
> If my numbers are right, with the default values that's 64 ZB.
> 
> In addition to that, the size that can be covered by the refcount table
> also depends on the size of refcount entries (refcount_order).

True.

> 
> With 512 byte clusters and 64 bit refcount entries I still get 8 PB, way
> over what's limited by the L1/L2 tables (128 GB).

Do I need to make any modifications to the sentence, then?  Or is it 
still accurate, if vague, to leave the sentence as is because there IS 
an impact to consider, even if the impact is unlikely to matter in 
relation to other sizing impacts?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints
Posted by Alberto Garcia 7 years, 7 months ago
On Wed 28 Feb 2018 03:01:33 PM CET, Eric Blake wrote:

>>> The refcount table has implications on the maximum host file size; a
>>> larger cluster size is required for the refcount table to cover
>>> larger offsets.
>> 
>> Why is this? Because of the refcount_table_clusters field ?
>> 
>> I think the maximum offset allowed by that is ridiculously high,
>> exceeding any other limit imposed by the L1/L2 tables.
   [...]
>> With 512 byte clusters and 64 bit refcount entries I still get 8 PB,
>> way over what's limited by the L1/L2 tables (128 GB).
>
> Do I need to make any modifications to the sentence, then?

I guess what surprised me the first time that I read it was that it
suggests that this has to be taken into account when calculating the
physical limits of an image, while in practice it can be ignored.

You could say something like 

  Although the larger the cluster size, the larger the offsets that can
  be covered by the refcount table, in practice these limits cannot be
  reached because they are larger than the ones imposed by other data
  structures.

although I'm sure that you can come up with a better wording than mine :)

Berto

Re: [Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints
Posted by Max Reitz 7 years, 5 months ago
On 2018-02-28 15:20, Alberto Garcia wrote:
> On Wed 28 Feb 2018 03:01:33 PM CET, Eric Blake wrote:
> 
>>>> The refcount table has implications on the maximum host file size; a
>>>> larger cluster size is required for the refcount table to cover
>>>> larger offsets.
>>>
>>> Why is this? Because of the refcount_table_clusters field ?
>>>
>>> I think the maximum offset allowed by that is ridiculously high,
>>> exceeding any other limit imposed by the L1/L2 tables.
>    [...]
>>> With 512 byte clusters and 64 bit refcount entries I still get 8 PB,
>>> way over what's limited by the L1/L2 tables (128 GB).
>>
>> Do I need to make any modifications to the sentence, then?
> 
> I guess what surprised me the first time that I read it was that it
> suggests that this has to be taken into account when calculating the
> physical limits of an image, while in practice it can be ignored.
> 
> You could say something like 
> 
>   Although the larger the cluster size, the larger the offsets that can
>   be covered by the refcount table, in practice these limits cannot be
>   reached because they are larger than the ones imposed by other data
>   structures.

Are there any updates here?  I guess I personally would just drop the
whole paragraph, because I think it really doesn't matter...

Also note that the maximum file size of ext4 is 16 PB (for 4 kB blocks).
 OK, it's bigger for XFS, but that still gives some perspective.

Also, long before anyone is going to complain about the specification
failing to mention that limit, they are going to complain that qemu
refuses to open their image (because of its limit on the reftable size).

Max

> although I'm sure that you can come up with a better wording than mine :)
> 
> Berto
> 


Re: [Qemu-devel] [PATCH v4 2/5] qcow2: Document some maximum size constraints
Posted by Eric Blake 7 years, 5 months ago
On 04/13/2018 12:08 PM, Max Reitz wrote:

>>>> With 512 byte clusters and 64 bit refcount entries I still get 8 PB,
>>>> way over what's limited by the L1/L2 tables (128 GB).
>>>
>>> Do I need to make any modifications to the sentence, then?
>>
>> I guess what surprised me the first time that I read it was that it
>> suggests that this has to be taken into account when calculating the
>> physical limits of an image, while in practice it can be ignored.
>>
>> You could say something like 
>>
>>   Although the larger the cluster size, the larger the offsets that can
>>   be covered by the refcount table, in practice these limits cannot be
>>   reached because they are larger than the ones imposed by other data
>>   structures.
> 
> Are there any updates here?  I guess I personally would just drop the
> whole paragraph, because I think it really doesn't matter...

Yeah, I need to post a v5 of this series now that 2.13 is nearly open.

> 
> Also note that the maximum file size of ext4 is 16 PB (for 4 kB blocks).
>  OK, it's bigger for XFS, but that still gives some perspective.
> 
> Also, long before anyone is going to complain about the specification
> failing to mention that limit, they are going to complain that qemu
> refuses to open their image (because of its limit on the reftable size).
> 
> Max
> 
>> although I'm sure that you can come up with a better wording than mine :)
>>
>> Berto
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org