[Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots

Alberto Garcia posted 1 patch 6 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180209113744.11842-1-berto@igalia.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppcbe passed
Test ppcle passed
Test s390x passed
block/qcow2-snapshot.c     | 14 ++++++++++----
tests/qemu-iotests/080     | 10 +++++++++-
tests/qemu-iotests/080.out | 10 ++++++++--
3 files changed, 27 insertions(+), 7 deletions(-)
[Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Alberto Garcia 6 years, 2 months ago
The code that reads the qcow2 snapshot table takes the offset and size
of all snapshots' L1 table without doing any kind of checks.

Although qcow2_snapshot_load_tmp() does verify that the table size is
valid, the table offset is not checked at all. On top of that there
are several other code paths that deal with internal snapshots that
don't use that function or do any other check.

This patch verifies that all L1 tables are correctly aligned and the
size does not exceed the maximum allowed valued.

The check from qcow2_snapshot_load_tmp() is removed since it's now
useless (opening an image will fail before that).

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-snapshot.c     | 14 ++++++++++----
 tests/qemu-iotests/080     | 10 +++++++++-
 tests/qemu-iotests/080.out | 10 ++++++++--
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 44243e0e95..b9d46d95fa 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -121,6 +121,16 @@ int qcow2_read_snapshots(BlockDriverState *bs)
         offset += name_size;
         sn->name[name_size] = '\0';
 
+        if (offset_into_cluster(s, sn->l1_table_offset)) {
+            ret = -EINVAL;
+            goto fail;
+        }
+
+        if (sn->l1_size > QCOW_MAX_L1_SIZE / sizeof(uint64_t)) {
+            ret = -EFBIG;
+            goto fail;
+        }
+
         if (offset - s->snapshots_offset > QCOW_MAX_SNAPSHOTS_SIZE) {
             ret = -EFBIG;
             goto fail;
@@ -704,10 +714,6 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
     sn = &s->snapshots[snapshot_index];
 
     /* Allocate and read in the snapshot's L1 table */
-    if (sn->l1_size > QCOW_MAX_L1_SIZE / sizeof(uint64_t)) {
-        error_setg(errp, "Snapshot L1 table too large");
-        return -EFBIG;
-    }
     new_l1_bytes = sn->l1_size * sizeof(uint64_t);
     new_l1_table = qemu_try_blockalign(bs->file->bs,
                                        align_offset(new_l1_bytes, 512));
diff --git a/tests/qemu-iotests/080 b/tests/qemu-iotests/080
index 1c2bd85742..dec6749908 100755
--- a/tests/qemu-iotests/080
+++ b/tests/qemu-iotests/080
@@ -171,7 +171,15 @@ poke_file "$TEST_IMG" "$offset_l2_table_0" "\x80\x00\x00\xff\xff\xff\x00\x00"
 { $QEMU_IMG snapshot -c test $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
-echo "== Invalid snapshot L1 table =="
+echo "== Invalid snapshot L1 table offset =="
+_make_test_img 64M
+{ $QEMU_IO -c "write 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IMG snapshot -c test $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+poke_file "$TEST_IMG" "$offset_snap1_l1_offset" "\x00\x00\x00\x00\x00\x40\x02\x00"
+{ $QEMU_IMG convert -s test $TEST_IMG $TEST_IMG.snap; } 2>&1 | _filter_testdir
+
+echo
+echo "== Invalid snapshot L1 table size =="
 _make_test_img 64M
 { $QEMU_IO -c "write 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
 { $QEMU_IMG snapshot -c test $TEST_IMG; } 2>&1 | _filter_testdir
diff --git a/tests/qemu-iotests/080.out b/tests/qemu-iotests/080.out
index 6a7fda1356..86ed9567b7 100644
--- a/tests/qemu-iotests/080.out
+++ b/tests/qemu-iotests/080.out
@@ -59,9 +59,15 @@ wrote 512/512 bytes at offset 0
 qemu-img: Could not create snapshot 'test': -27 (File too large)
 qemu-img: Could not create snapshot 'test': -11 (Resource temporarily unavailable)
 
-== Invalid snapshot L1 table ==
+== Invalid snapshot L1 table offset ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-qemu-img: Failed to load snapshot: Snapshot L1 table too large
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Could not read snapshots: Invalid argument
+
+== Invalid snapshot L1 table size ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Could not read snapshots: File too large
 *** done
-- 
2.11.0


Re: [Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Max Reitz 6 years, 2 months ago
On 2018-02-09 12:37, Alberto Garcia wrote:
> The code that reads the qcow2 snapshot table takes the offset and size
> of all snapshots' L1 table without doing any kind of checks.
> 
> Although qcow2_snapshot_load_tmp() does verify that the table size is
> valid, the table offset is not checked at all. On top of that there
> are several other code paths that deal with internal snapshots that
> don't use that function or do any other check.
> 
> This patch verifies that all L1 tables are correctly aligned and the
> size does not exceed the maximum allowed valued.
> 
> The check from qcow2_snapshot_load_tmp() is removed since it's now
> useless (opening an image will fail before that).
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-snapshot.c     | 14 ++++++++++----
>  tests/qemu-iotests/080     | 10 +++++++++-
>  tests/qemu-iotests/080.out | 10 ++++++++--
>  3 files changed, 27 insertions(+), 7 deletions(-)

I don't like completely refusing to open a qcow2 image if one of the
snapshots is invalid, without giving the user any way of fixing it.

With this patch, the final two images created in 080 cannot be opened at
all (not even with qemu-img info).  Without it, you can, you just can't
load the snapshots.  (Well, in one case.  In the other, you can even do
that, but that's a bug, as you correctly claim.)

More importantly, though, without this patch, you can delete the
snapshots.  qemu-img will complain a bit, and you'll have leaks
afterwards, but that's nothing qemu-img check can't fix.  With this
patch, you can't, because the image cannot be opened at all so it's
basically gone for good (unless you get a hex editor).

Another thing is that the error messages are not really useful...
(which is already an issue, but this patch doesn't make it better.)

Max

Re: [Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Alberto Garcia 6 years, 2 months ago
On Fri 09 Feb 2018 02:04:06 PM CET, Max Reitz wrote:
> I don't like completely refusing to open a qcow2 image if one of the
> snapshots is invalid, without giving the user any way of fixing it.
>
> With this patch, the final two images created in 080 cannot be opened
> at all (not even with qemu-img info).  Without it, you can, you just
> can't load the snapshots.  (Well, in one case.  In the other, you can
> even do that, but that's a bug, as you correctly claim.)
>
> More importantly, though, without this patch, you can delete the
> snapshots.  qemu-img will complain a bit, and you'll have leaks
> afterwards, but that's nothing qemu-img check can't fix.  With this
> patch, you can't, because the image cannot be opened at all so it's
> basically gone for good (unless you get a hex editor).

What you're saying is correct. The alternative however is to add checks
everywhere the snapshot table is used.

There's for example qcow2_check_metadata_overlap() that will happily
allocate a buffer of (l1_size * 8) bytes, that can be up to 32 GB.

qcow2_expand_zero_clusters() has the same problem but there g_realloc()
is used, aborting the process on failure (we should actually use
g_try_realloc() instead, I'll leave that for a different patch).

If we want to allow opening images with corrupted snapshots perhaps we
could still do the checks in qcow2_read_snapshots() but instead of
returning an error add a 'corrupted' flag to QCowSnapshot.

That solution is not without problems because we'd still need to check
for that flag everytime the snapshot is used, and that's calling for
errors in the future.

Berto

Re: [Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Max Reitz 6 years, 2 months ago
On 2018-02-09 14:35, Alberto Garcia wrote:
> On Fri 09 Feb 2018 02:04:06 PM CET, Max Reitz wrote:
>> I don't like completely refusing to open a qcow2 image if one of the
>> snapshots is invalid, without giving the user any way of fixing it.
>>
>> With this patch, the final two images created in 080 cannot be opened
>> at all (not even with qemu-img info).  Without it, you can, you just
>> can't load the snapshots.  (Well, in one case.  In the other, you can
>> even do that, but that's a bug, as you correctly claim.)
>>
>> More importantly, though, without this patch, you can delete the
>> snapshots.  qemu-img will complain a bit, and you'll have leaks
>> afterwards, but that's nothing qemu-img check can't fix.  With this
>> patch, you can't, because the image cannot be opened at all so it's
>> basically gone for good (unless you get a hex editor).
> 
> What you're saying is correct. The alternative however is to add checks
> everywhere the snapshot table is used.

Yes.

> There's for example qcow2_check_metadata_overlap() that will happily
> allocate a buffer of (l1_size * 8) bytes, that can be up to 32 GB.

It will only do that with QCOW2_OL_INACTIVE_L2, though, which is not
enabled by default and will cost a ton of performance anyway, when used.

(And by the way, I'd be OK with dropping that branch.  Or with
revisiting my good old series that strived to make the overlap checks
faster in general.)

> qcow2_expand_zero_clusters() has the same problem but there g_realloc()
> is used, aborting the process on failure (we should actually use
> g_try_realloc() instead, I'll leave that for a different patch).

And this is only used for qemu-img amend from v3 to v2, offline.  While
aborts are bad, in this case they're not even that bad (because this is
not a common operation at all, and because it's fully offline).

And I'm still not sure how this patch would make anything better.
Without it, qemu-img amend aborts.  Yes, too bad.  So you can still do
qemu-img convert, losing all snapshots, but at least you got something.
(Or we just fix qemu-img amend, that tells you there's something bad
going on, and then you delete the snapshot in question.)

With this patch, your whole data is gone because qemu now refuses to
look at anything in the image.

> If we want to allow opening images with corrupted snapshots perhaps we
> could still do the checks in qcow2_read_snapshots() but instead of
> returning an error add a 'corrupted' flag to QCowSnapshot.

If you think that's simpler than having the proper checks everywhere we
access a snapshot, be my guest.  It does sound reasonable and it might
be nicer for users than having to figure out which snapshot to delete
themselves.

> That solution is not without problems because we'd still need to check
> for that flag everytime the snapshot is used, and that's calling for
> errors in the future.

Well, errors if your file is corrupted.  Not a good thing but I still
prefer it over rejecting files altogether because some non-vital data
structure is corrupted, with no chance of allowing the user to fix the
issue.

An intermediate thing would be to mark the whole image corrupt if some
snapshot is broken and then adding functionality to qemu-img check to
fix that (by deleting the snapshot, if the user has agreed to that).
But whatever we do, there needs to be a way for the user to get all
non-corrupted data off the image.

Max

Re: [Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Kevin Wolf 6 years, 2 months ago
Am 09.02.2018 um 14:04 hat Max Reitz geschrieben:
> On 2018-02-09 12:37, Alberto Garcia wrote:
> > The code that reads the qcow2 snapshot table takes the offset and size
> > of all snapshots' L1 table without doing any kind of checks.
> > 
> > Although qcow2_snapshot_load_tmp() does verify that the table size is
> > valid, the table offset is not checked at all. On top of that there
> > are several other code paths that deal with internal snapshots that
> > don't use that function or do any other check.
> > 
> > This patch verifies that all L1 tables are correctly aligned and the
> > size does not exceed the maximum allowed valued.
> > 
> > The check from qcow2_snapshot_load_tmp() is removed since it's now
> > useless (opening an image will fail before that).
> > 
> > Signed-off-by: Alberto Garcia <berto@igalia.com>
> > ---
> >  block/qcow2-snapshot.c     | 14 ++++++++++----
> >  tests/qemu-iotests/080     | 10 +++++++++-
> >  tests/qemu-iotests/080.out | 10 ++++++++--
> >  3 files changed, 27 insertions(+), 7 deletions(-)
> 
> I don't like completely refusing to open a qcow2 image if one of the
> snapshots is invalid, without giving the user any way of fixing it.
> 
> With this patch, the final two images created in 080 cannot be opened at
> all (not even with qemu-img info).  Without it, you can, you just can't
> load the snapshots.  (Well, in one case.  In the other, you can even do
> that, but that's a bug, as you correctly claim.)
> 
> More importantly, though, without this patch, you can delete the
> snapshots.  qemu-img will complain a bit, and you'll have leaks
> afterwards, but that's nothing qemu-img check can't fix.  With this
> patch, you can't, because the image cannot be opened at all so it's
> basically gone for good (unless you get a hex editor).

How about we move the check to bdrv_open() as proposed, but make it
conditional so that it's skipped with BDRV_O_CHECK and then add a way to
fix the situation with qemu-img check -r?

Kevin
Re: [Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Alberto Garcia 6 years, 2 months ago
On Fri 09 Feb 2018 04:03:31 PM CET, Kevin Wolf wrote:
> How about we move the check to bdrv_open() as proposed, but make it
> conditional so that it's skipped with BDRV_O_CHECK and then add a way
> to fix the situation with qemu-img check -r?

That was one of the alternatives that I was considering, but you can't
really fix a broken L1 table pointer, can you?

What you can probably do is allow deleting the snapshot, but 'qemu-img
snapshot' doesn't use BDRV_O_CHECK. We'd have to change that, look for a
new flag or think of some other alternative.

Or, perhaps more easily, make 'qemu-img check' delete all corrupted
snapshots.

Berto

Re: [Qemu-devel] [PATCH] qcow2: Check the L1 table parameters from all internal snapshots
Posted by Kevin Wolf 6 years, 2 months ago
Am 09.02.2018 um 16:11 hat Alberto Garcia geschrieben:
> On Fri 09 Feb 2018 04:03:31 PM CET, Kevin Wolf wrote:
> > How about we move the check to bdrv_open() as proposed, but make it
> > conditional so that it's skipped with BDRV_O_CHECK and then add a way
> > to fix the situation with qemu-img check -r?
> 
> That was one of the alternatives that I was considering, but you can't
> really fix a broken L1 table pointer, can you?
> 
> What you can probably do is allow deleting the snapshot, but 'qemu-img
> snapshot' doesn't use BDRV_O_CHECK. We'd have to change that, look for a
> new flag or think of some other alternative.
> 
> Or, perhaps more easily, make 'qemu-img check' delete all corrupted
> snapshots.

Yes, the latter is what I was thinking of. Of course, that would require
a new -r option because it would be a destructive operation.

Maybe it would even be worth making such destructive operations
interactive so that the user has to explicitly confirm each snapshot
that we delete. Not sure though.

Kevin