Commit 1f4ad7d fixed 'qemu-img info' for raw images that are currently
in use as a mirror target. It is not enough for image formats, though,
as these still unconditionally request BLK_PERM_CONSISTENT_READ.
As this permission is meaningless unless you do actual I/O on the image,
drop the requirement and allow 'qemu-img info' even for image formats
under conditions where BLK_PERM_CONSISTENT_READ can't be granted.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 9a1a0d1e73..ddb6836c52 100644
--- a/block.c
+++ b/block.c
@@ -1924,6 +1924,8 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
assert(role == &child_backing || role == &child_file);
if (!backing) {
+ int flags = bdrv_reopen_get_flags(reopen_queue, bs);
+
/* Apart from the modifications below, the same permissions are
* forwarded and left alone as for filters */
bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared,
@@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
/* bs->file always needs to be consistent because of the metadata. We
* can never allow other users to resize or write to it. */
- perm |= BLK_PERM_CONSISTENT_READ;
+ if (!(flags & BDRV_O_NO_IO)) {
+ perm |= BLK_PERM_CONSISTENT_READ;
+ }
shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
} else {
/* We want consistent read from backing files if the parent needs it.
--
2.13.6
On 11/30/2017 10:44 AM, Kevin Wolf wrote: > Commit 1f4ad7d fixed 'qemu-img info' for raw images that are currently > in use as a mirror target. It is not enough for image formats, though, > as these still unconditionally request BLK_PERM_CONSISTENT_READ. > > As this permission is meaningless unless you do actual I/O on the image, > drop the requirement and allow 'qemu-img info' even for image formats > under conditions where BLK_PERM_CONSISTENT_READ can't be granted. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > block.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > @@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c, > > /* bs->file always needs to be consistent because of the metadata. We > * can never allow other users to resize or write to it. */ > - perm |= BLK_PERM_CONSISTENT_READ; > + if (!(flags & BDRV_O_NO_IO)) { > + perm |= BLK_PERM_CONSISTENT_READ; I thought BDRV_O_NO_IO only means we aren't doing I/O on guest-visible data, but doesn't stop us from reading the metadata. The comment is telling: if we can read metadata, then we depend on CONSISTENT_READ for the metadata to be stable (even if we don't care about guest data consistency). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Am 30.11.2017 um 18:09 hat Eric Blake geschrieben: > On 11/30/2017 10:44 AM, Kevin Wolf wrote: > > Commit 1f4ad7d fixed 'qemu-img info' for raw images that are currently > > in use as a mirror target. It is not enough for image formats, though, > > as these still unconditionally request BLK_PERM_CONSISTENT_READ. > > > > As this permission is meaningless unless you do actual I/O on the image, > > drop the requirement and allow 'qemu-img info' even for image formats > > under conditions where BLK_PERM_CONSISTENT_READ can't be granted. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > block.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > @@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c, > > /* bs->file always needs to be consistent because of the metadata. We > > * can never allow other users to resize or write to it. */ > > - perm |= BLK_PERM_CONSISTENT_READ; > > + if (!(flags & BDRV_O_NO_IO)) { > > + perm |= BLK_PERM_CONSISTENT_READ; > > I thought BDRV_O_NO_IO only means we aren't doing I/O on guest-visible data, > but doesn't stop us from reading the metadata. The comment is telling: if > we can read metadata, then we depend on CONSISTENT_READ for the metadata to > be stable (even if we don't care about guest data consistency). Yes and no. The trouble is that at the file system level we have only a single bit to describe the consistency of the whole image throughout the whole block driver tree. We forbid shared BLK_PERM_CONSISTENT_READ for mirror targets (which aren't fully populated yet) and intermediate nodes for commit (which expose corrupted data). Both scenarios are really about the data exposed at the format layer, the metadata stays completely consistent. The question is what we do with this as we propagate permissions down to the protocol layer. Strictly speaking, the file at the protocol layer is perfectly consistent, so we might not forbid BLK_PERM_CONSISTENT_READ there. But I think it's more useful to do it anyway so that image locking can prevent the typical case of another process that uses qcow2 over file-posix again, where the file-posix node could in theory be considered consistent, but the qcow2 one wouldn't. In the end, this is just a pragmatic way to let 'qemu-img info' work while the image is a mirror target or intermediate node for commit, but to forbid cases where corrupted data is used. Or would you argue that either 'qemu-img info' shouldn't be working or reading corrupted data should be allowed in other processes? Kevin
On 11/30/2017 12:27 PM, Kevin Wolf wrote: >>> @@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c, >>> /* bs->file always needs to be consistent because of the metadata. We >>> * can never allow other users to resize or write to it. */ >>> - perm |= BLK_PERM_CONSISTENT_READ; >>> + if (!(flags & BDRV_O_NO_IO)) { >>> + perm |= BLK_PERM_CONSISTENT_READ; >> >> I thought BDRV_O_NO_IO only means we aren't doing I/O on guest-visible data, >> but doesn't stop us from reading the metadata. The comment is telling: if >> we can read metadata, then we depend on CONSISTENT_READ for the metadata to >> be stable (even if we don't care about guest data consistency). > > Yes and no. The trouble is that at the file system level we have only a > single bit to describe the consistency of the whole image throughout the > whole block driver tree. > > We forbid shared BLK_PERM_CONSISTENT_READ for mirror targets (which > aren't fully populated yet) and intermediate nodes for commit (which > expose corrupted data). Both scenarios are really about the data exposed > at the format layer, the metadata stays completely consistent. I guess it is the block of writes/resizes that prevents metadata from getting inconsistent; CONSISTENT_READ does indeed make more sense if interpreted solely in light of will the guest read consistent data (and not will the format layer see consistent contents from the protocol layer). > > The question is what we do with this as we propagate permissions down to > the protocol layer. Strictly speaking, the file at the protocol layer is > perfectly consistent, so we might not forbid BLK_PERM_CONSISTENT_READ > there. But I think it's more useful to do it anyway so that image > locking can prevent the typical case of another process that uses qcow2 > over file-posix again, where the file-posix node could in theory be > considered consistent, but the qcow2 one wouldn't. Does that mean we need two separate permission bits, one for whether the guest-visible data is consistent, and one for whether the metadata is consistent? But as I mentioned above, blocking writes should mean that no one else is messing with metadata. > > In the end, this is just a pragmatic way to let 'qemu-img info' work > while the image is a mirror target or intermediate node for commit, but > to forbid cases where corrupted data is used. > > Or would you argue that either 'qemu-img info' shouldn't be working or > reading corrupted data should be allowed in other processes? Having qemu-img info work on a mirror target makes sense; as you pointed out, the metadata flushed to disk is consistent (may have leaked clusters, but not broken image) at any given point by the writer (it has to be, since the writer has to be able to recover the image if there is an abrupt restart). And a second reader can still manage to see inconsistent data when reading two separate clusters if the timing is just right, regardless of whether the first writer is always able to see consistent data. So to some extent, it's a measure of how risky is the action? qemu-img info is read-only, and most of the time will just work; it is very hard to get to the rare race condition where two consecutive reads raced with a parallel writer combine to result in incorrect interpretation of the data by the reader. I'm not opposed to your patch, but am trying to make sure that I'm not overlooking any problem before giving R-b. Maybe it's just that the comment needs updating in v2. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Am 30.11.2017 um 20:05 hat Eric Blake geschrieben: > On 11/30/2017 12:27 PM, Kevin Wolf wrote: > > > > @@ -1936,7 +1938,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c, > > > > /* bs->file always needs to be consistent because of the metadata. We > > > > * can never allow other users to resize or write to it. */ > > > > - perm |= BLK_PERM_CONSISTENT_READ; > > > > + if (!(flags & BDRV_O_NO_IO)) { > > > > + perm |= BLK_PERM_CONSISTENT_READ; > > > > > > I thought BDRV_O_NO_IO only means we aren't doing I/O on guest-visible data, > > > but doesn't stop us from reading the metadata. The comment is telling: if > > > we can read metadata, then we depend on CONSISTENT_READ for the metadata to > > > be stable (even if we don't care about guest data consistency). > > > > Yes and no. The trouble is that at the file system level we have only a > > single bit to describe the consistency of the whole image throughout the > > whole block driver tree. > > > > We forbid shared BLK_PERM_CONSISTENT_READ for mirror targets (which > > aren't fully populated yet) and intermediate nodes for commit (which > > expose corrupted data). Both scenarios are really about the data exposed > > at the format layer, the metadata stays completely consistent. > > I guess it is the block of writes/resizes that prevents metadata from > getting inconsistent; CONSISTENT_READ does indeed make more sense if > interpreted solely in light of will the guest read consistent data > (and not will the format layer see consistent contents from the > protocol layer). Yes, that's what the write/resize locks are meant for. Consistent read is more about "this image may not contain the useful data you're expecting". > > The question is what we do with this as we propagate permissions down to > > the protocol layer. Strictly speaking, the file at the protocol layer is > > perfectly consistent, so we might not forbid BLK_PERM_CONSISTENT_READ > > there. But I think it's more useful to do it anyway so that image > > locking can prevent the typical case of another process that uses qcow2 > > over file-posix again, where the file-posix node could in theory be > > considered consistent, but the qcow2 one wouldn't. > > Does that mean we need two separate permission bits, one for whether > the guest-visible data is consistent, and one for whether the metadata > is consistent? But as I mentioned above, blocking writes should mean > that no one else is messing with metadata. Even this works only in the simple case of format over protocol. I think this becomes pretty clear when you think of the (admittedly unlikely) case of nested qcow2. Then you need at least a bit per layer, one for the guest-visible data, one for the top-level qcow2 metadata and another one for the second qcow2 metadata. > > In the end, this is just a pragmatic way to let 'qemu-img info' work > > while the image is a mirror target or intermediate node for commit, but > > to forbid cases where corrupted data is used. > > > > Or would you argue that either 'qemu-img info' shouldn't be working or > > reading corrupted data should be allowed in other processes? > > Having qemu-img info work on a mirror target makes sense; as you pointed > out, the metadata flushed to disk is consistent (may have leaked clusters, > but not broken image) at any given point by the writer (it has to be, since > the writer has to be able to recover the image if there is an abrupt > restart). Exactly, that's the reasoning. > And a second reader can still manage to see inconsistent data > when reading two separate clusters if the timing is just right, regardless > of whether the first writer is always able to see consistent data. So to > some extent, it's a measure of how risky is the action? qemu-img info is > read-only, and most of the time will just work; it is very hard to get to > the rare race condition where two consecutive reads raced with a parallel > writer combine to result in incorrect interpretation of the data by the > reader. This again is more about the write permission. You still need 'qemu-img info -U' because there is a concurrent writer, so you have to acknowledge the risk you're mentioning. > I'm not opposed to your patch, but am trying to make sure that I'm not > overlooking any problem before giving R-b. Maybe it's just that the > comment needs updating in v2. Do you have a suggestion for the comment? Kevin
On 12/01/2017 06:41 AM, Kevin Wolf wrote: >> >> I guess it is the block of writes/resizes that prevents metadata from >> getting inconsistent; CONSISTENT_READ does indeed make more sense if >> interpreted solely in light of will the guest read consistent data >> (and not will the format layer see consistent contents from the >> protocol layer). > > Yes, that's what the write/resize locks are meant for. > > Consistent read is more about "this image may not contain the useful > data you're expecting". > > >> I'm not opposed to your patch, but am trying to make sure that I'm not >> overlooking any problem before giving R-b. Maybe it's just that the >> comment needs updating in v2. > > Do you have a suggestion for the comment? Perhaps: Commit 1f4ad7d fixed 'qemu-img info' for raw images that are currently in use as a mirror target. It is not enough for image formats, though, as these still unconditionally request BLK_PERM_CONSISTENT_READ. As this permission is geared towards whether the guest-visible data is consistent, and has no impact on whether the metadata is sane, and 'qemu-img info' does not read guest-visible data (except for the raw format), it makes sense to not require BLK_PERM_CONSISTENT_READ if there is not going to be any guest I/O performed, regardless of image format. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
© 2016 - 2024 Red Hat, Inc.