qcow2: Fix cluster leaks on write error

[Qemu-devel] [PATCH 2/3] qcow2: Free allocated clusters on write error

Posted by Kevin Wolf 7 years, 4 months ago

If we managed to allocate the clusters, but then failed to write the
data, there's a good chance that we'll still be able to free the
clusters again in order to avoid cluster leaks (the refcounts are
cached, so even if we can't write them out right now, we may be able to
do so when the VM is resumed after a werror=stop/enospc pause).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.h         |  1 +
 block/qcow2-cluster.c | 11 +++++++++++
 block/qcow2.c         |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 01b5250415..1c9c0d3631 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -614,6 +614,7 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          int compressed_size);
 
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
+void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m);
 int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
                           uint64_t bytes, enum qcow2_discard_type type,
                           bool full_discard);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0d74584c9b..d37fe08b3d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -994,6 +994,17 @@ err:
     return ret;
  }
 
+/**
+ * Frees the allocated clusters because the request failed and they won't
+ * actually be linked.
+ */
+void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
+{
+    BDRVQcow2State *s = bs->opaque;
+    qcow2_free_clusters(bs, m->alloc_offset, m->nb_clusters << s->cluster_bits,
+                        QCOW2_DISCARD_NEVER);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
diff --git a/block/qcow2.c b/block/qcow2.c
index da54b2c8f8..7a4db12afd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1777,6 +1777,8 @@ static coroutine_fn int qcow2_handle_l2meta(BlockDriverState *bs,
             if (ret) {
                 goto out;
             }
+        } else {
+            qcow2_alloc_cluster_abort(bs, l2meta);
         }
 
         /* Take the request off the list of running requests */
-- 
2.13.6

Re: [Qemu-devel] [PATCH 2/3] qcow2: Free allocated clusters on write error

Posted by Eric Blake 7 years, 4 months ago

On 06/28/2018 10:39 AM, Kevin Wolf wrote:
> If we managed to allocate the clusters, but then failed to write the
> data, there's a good chance that we'll still be able to free the
> clusters again in order to avoid cluster leaks (the refcounts are
> cached, so even if we can't write them out right now, we may be able to
> do so when the VM is resumed after a werror=stop/enospc pause).
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2.h         |  1 +
>   block/qcow2-cluster.c | 11 +++++++++++
>   block/qcow2.c         |  2 ++
>   3 files changed, 14 insertions(+)

Hmm, I wonder if this interacts poorly with my proposed test addition 
for HUGE images, which is still pending review:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05488.html
https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg04542.html

(time for me to go testing...)

> +++ b/block/qcow2.c
> @@ -1777,6 +1777,8 @@ static coroutine_fn int qcow2_handle_l2meta(BlockDriverState *bs,
>               if (ret) {
>                   goto out;
>               }
> +        } else {
> +            qcow2_alloc_cluster_abort(bs, l2meta);
>           }

None of our existing tests caught this? But it looks right to me.
Reviewed-by: Eric Blake <eblake@redhat.com>


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 2/3] qcow2: Free allocated clusters on write error

Posted by Eric Blake 7 years, 4 months ago

On 06/28/2018 10:49 AM, Eric Blake wrote:
> On 06/28/2018 10:39 AM, Kevin Wolf wrote:
>> If we managed to allocate the clusters, but then failed to write the
>> data, there's a good chance that we'll still be able to free the
>> clusters again in order to avoid cluster leaks (the refcounts are
>> cached, so even if we can't write them out right now, we may be able to
>> do so when the VM is resumed after a werror=stop/enospc pause).
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>   block/qcow2.h         |  1 +
>>   block/qcow2-cluster.c | 11 +++++++++++
>>   block/qcow2.c         |  2 ++
>>   3 files changed, 14 insertions(+)
> 
> Hmm, I wonder if this interacts poorly with my proposed test addition 
> for HUGE images, which is still pending review:
> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05488.html
> https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg04542.html
> 
> (time for me to go testing...)

Phew, my test didn't change (but I will go ahead and repost a v7 to fix 
the minor conflict in iotest numbering).

And that means I can also add for this series:

Tested-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org