[v3] qcow2: keep reference on zeroize with discard-no-unref enabled

[PATCH v3] qcow2: keep reference on zeroize with discard-no-unref enabled

Posted by Jean-Louis Dupond 7 months ago

When the discard-no-unref flag is enabled, we keep the reference for
normal discard requests.
But when a discard is executed on a snapshot/qcow2 image with backing,
the discards are saved as zero clusters in the snapshot image.

When committing the snapshot to the backing file, not
discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
any logic to keep the reference when discard-no-unref is enabled.

Therefor we add logic in the zero_in_l2_slice call to keep the reference
on commit.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
---
 block/qcow2-cluster.c | 22 ++++++++++++++++++----
 qapi/block-core.json  |  7 ++++---
 qemu-options.hx       |  3 ++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f4f6cd6ad0..fc764aea4d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1984,7 +1984,7 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
             /* If we keep the reference, pass on the discard still */
             bdrv_pdiscard(s->data_file, old_l2_entry & L2E_OFFSET_MASK,
                           s->cluster_size);
-       }
+        }
     }
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
@@ -2062,9 +2062,15 @@ zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
         bool unmap = (type == QCOW2_CLUSTER_COMPRESSED) ||
             ((flags & BDRV_REQ_MAY_UNMAP) && qcow2_cluster_is_allocated(type));
-        uint64_t new_l2_entry = unmap ? 0 : old_l2_entry;
+        bool keep_reference =
+            (s->discard_no_unref && type != QCOW2_CLUSTER_COMPRESSED);
+        uint64_t new_l2_entry = old_l2_entry;
         uint64_t new_l2_bitmap = old_l2_bitmap;
 
+        if (unmap && !keep_reference) {
+            new_l2_entry = 0;
+        }
+
         if (has_subclusters(s)) {
             new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
         } else {
@@ -2082,9 +2088,17 @@ zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
             set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
         }
 
-        /* Then decrease the refcount */
         if (unmap) {
-            qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
+            if (!keep_reference) {
+                /* Then decrease the refcount */
+                qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
+            } else if (s->discard_passthrough[QCOW2_DISCARD_REQUEST] &&
+                       (type == QCOW2_CLUSTER_NORMAL ||
+                        type == QCOW2_CLUSTER_ZERO_ALLOC)) {
+                /* If we keep the reference, pass on the discard still */
+                bdrv_pdiscard(s->data_file, old_l2_entry & L2E_OFFSET_MASK,
+                            s->cluster_size);
+            }
         }
     }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 89751d81f2..9836195850 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3476,15 +3476,16 @@
 #     should be issued on other occasions where a cluster gets freed
 #
 # @discard-no-unref: when enabled, discards from the guest will not
-#     cause cluster allocations to be relinquished.  This prevents
+#     cause cluster allocations to be relinquished. The same will
+#     happen for discards triggered by zeroize. This prevents
 #     qcow2 fragmentation that would be caused by such discards.
 #     Besides potential performance degradation, such fragmentation
 #     can lead to increased allocation of clusters past the end of the
 #     image file, resulting in image files whose file length can grow
-#     much larger than their guest disk size would suggest.  If image
+#     much larger than their guest disk size would suggest. If image
 #     file length is of concern (e.g. when storing qcow2 images
 #     directly on block devices), you should consider enabling this
-#     option.  (since 8.1)
+#     option. (since 8.1)
 #
 # @overlap-check: which overlap checks to perform for writes to the
 #     image, defaults to 'cached' (since 2.2)
diff --git a/qemu-options.hx b/qemu-options.hx
index bcd77255cb..3f31e71e4d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1452,7 +1452,8 @@ SRST
 
         ``discard-no-unref``
             When enabled, discards from the guest will not cause cluster
-            allocations to be relinquished. This prevents qcow2 fragmentation
+            allocations to be relinquished. The same will happen for
+            discards trigged in zeroize. This prevents qcow2 fragmentation
             that would be caused by such discards. Besides potential
             performance degradation, such fragmentation can lead to increased
             allocation of clusters past the end of the image file,
-- 
2.42.0

Re: [PATCH v3] qcow2: keep reference on zeroize with discard-no-unref enabled

Posted by Hanna Czenczek 6 months, 1 week ago

On 03.10.23 14:52, Jean-Louis Dupond wrote:
> When the discard-no-unref flag is enabled, we keep the reference for
> normal discard requests.
> But when a discard is executed on a snapshot/qcow2 image with backing,
> the discards are saved as zero clusters in the snapshot image.
>
> When committing the snapshot to the backing file, not
> discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
> any logic to keep the reference when discard-no-unref is enabled.
>
> Therefor we add logic in the zero_in_l2_slice call to keep the reference
> on commit.
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
> ---
>   block/qcow2-cluster.c | 22 ++++++++++++++++++----
>   qapi/block-core.json  |  7 ++++---
>   qemu-options.hx       |  3 ++-
>   3 files changed, 24 insertions(+), 8 deletions(-)

Thanks, applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna

Re: [PATCH v3] qcow2: keep reference on zeroize with discard-no-unref enabled

Posted by Hanna Czenczek 6 months, 1 week ago

On 03.10.23 14:52, Jean-Louis Dupond wrote:
> When the discard-no-unref flag is enabled, we keep the reference for
> normal discard requests.
> But when a discard is executed on a snapshot/qcow2 image with backing,
> the discards are saved as zero clusters in the snapshot image.
>
> When committing the snapshot to the backing file, not
> discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
> any logic to keep the reference when discard-no-unref is enabled.
>
> Therefor we add logic in the zero_in_l2_slice call to keep the reference
> on commit.
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
> ---
>   block/qcow2-cluster.c | 22 ++++++++++++++++++----
>   qapi/block-core.json  |  7 ++++---
>   qemu-options.hx       |  3 ++-
>   3 files changed, 24 insertions(+), 8 deletions(-)

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 89751d81f2..9836195850 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3476,15 +3476,16 @@
>   #     should be issued on other occasions where a cluster gets freed
>   #
>   # @discard-no-unref: when enabled, discards from the guest will not
> -#     cause cluster allocations to be relinquished.  This prevents
> +#     cause cluster allocations to be relinquished. The same will
> +#     happen for discards triggered by zeroize. This prevents

I don’t think “zeroize” has any meaning outside of qemu’s qcow2 code.  
I’d write “when enabled, data clusters will remain preallocated when 
they are no longer used, e.g. because they are discarded or converted to 
zero clusters.  As usual, whether the old data is discarded or kept on 
the protocol level (i.e. in the image file) depends on the setting of 
the pass-discard-request option. Keeping the clusters preallocated 
prevents qcow2 fragmentation that would otherwise be caused by freeing 
and re-allocating them later. Besides potential performance degradation, 
[...]”

If you’re OK with that, I can change that (here and in qemu-options.hx) 
when taking the patch.

>   #     qcow2 fragmentation that would be caused by such discards.
>   #     Besides potential performance degradation, such fragmentation
>   #     can lead to increased allocation of clusters past the end of the
>   #     image file, resulting in image files whose file length can grow
> -#     much larger than their guest disk size would suggest.  If image
> +#     much larger than their guest disk size would suggest. If image
>   #     file length is of concern (e.g. when storing qcow2 images
>   #     directly on block devices), you should consider enabling this
> -#     option.  (since 8.1)
> +#     option. (since 8.1)

These two changes don’t seem related, I’d remove them, too. 
(Double-space after '.' is fairly common in block-core.json, and in my 
emails, too. :))

Hanna

Re: [PATCH v3] qcow2: keep reference on zeroize with discard-no-unref enabled

Posted by Jean-Louis Dupond 6 months, 1 week ago

On 27/10/2023 11:49, Hanna Czenczek wrote:
> On 03.10.23 14:52, Jean-Louis Dupond wrote:
>> When the discard-no-unref flag is enabled, we keep the reference for
>> normal discard requests.
>> But when a discard is executed on a snapshot/qcow2 image with backing,
>> the discards are saved as zero clusters in the snapshot image.
>>
>> When committing the snapshot to the backing file, not
>> discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
>> any logic to keep the reference when discard-no-unref is enabled.
>>
>> Therefor we add logic in the zero_in_l2_slice call to keep the reference
>> on commit.
>>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>>   block/qcow2-cluster.c | 22 ++++++++++++++++++----
>>   qapi/block-core.json  |  7 ++++---
>>   qemu-options.hx       |  3 ++-
>>   3 files changed, 24 insertions(+), 8 deletions(-)
>
> [...]
>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 89751d81f2..9836195850 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3476,15 +3476,16 @@
>>   #     should be issued on other occasions where a cluster gets freed
>>   #
>>   # @discard-no-unref: when enabled, discards from the guest will not
>> -#     cause cluster allocations to be relinquished.  This prevents
>> +#     cause cluster allocations to be relinquished. The same will
>> +#     happen for discards triggered by zeroize. This prevents
>
> I don’t think “zeroize” has any meaning outside of qemu’s qcow2 code.  
> I’d write “when enabled, data clusters will remain preallocated when 
> they are no longer used, e.g. because they are discarded or converted 
> to zero clusters.  As usual, whether the old data is discarded or kept 
> on the protocol level (i.e. in the image file) depends on the setting 
> of the pass-discard-request option. Keeping the clusters preallocated 
> prevents qcow2 fragmentation that would otherwise be caused by freeing 
> and re-allocating them later. Besides potential performance 
> degradation, [...]”
>
> If you’re OK with that, I can change that (here and in 
> qemu-options.hx) when taking the patch.

Perfect!

>
>>   #     qcow2 fragmentation that would be caused by such discards.
>>   #     Besides potential performance degradation, such fragmentation
>>   #     can lead to increased allocation of clusters past the end of the
>>   #     image file, resulting in image files whose file length can grow
>> -#     much larger than their guest disk size would suggest.  If image
>> +#     much larger than their guest disk size would suggest. If image
>>   #     file length is of concern (e.g. when storing qcow2 images
>>   #     directly on block devices), you should consider enabling this
>> -#     option.  (since 8.1)
>> +#     option. (since 8.1)
>
> These two changes don’t seem related, I’d remove them, too. 
> (Double-space after '.' is fairly common in block-core.json, and in my 
> emails, too. :))
Fine
>
> Hanna
>

Thanks
Jean-Louis