[Qemu-devel] [PATCH 0/3] block: Make various formats' block_status recurse again

Max Reitz posted 3 patches 4 years, 9 months ago
Test FreeBSD passed
Test asan passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test checkpatch passed
Test s390x passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190725155512.9827-1-mreitz@redhat.com
Maintainers: Max Reitz <mreitz@redhat.com>, Stefan Weil <sw@weilnetz.de>, Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>
block/vdi.c  | 3 ++-
block/vmdk.c | 3 +++
block/vpc.c  | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
[Qemu-devel] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by Max Reitz 4 years, 9 months ago
Hi,

69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
would only go down to the protocol layer if the format layer returned
BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
information whether a given range in the image is zero or not.
Generally, this is because the image is preallocated and thus all ranges
appear as zeroes.

However, it only implemented this preallocation detection for qcow2.
There are more formats that support preallocation, though: vdi, vhdx,
vmdk, vpc.  (Funny how they all start with “v”.)

For vdi, vmdk, and vpc, the fix is rather simple, because they really
have different subformats depending on whether an image is preallocated
or not.  This makes the check very simple.

vhdx is more like qcow2, where after the image has been created, it
isn’t clear whether it’s been preallocated or everything is allocated
because everything was already written to.  69f47505ee added a heuristic
to qcow2 to get around this, but I think that’s too much for vhdx.  I
just left it unfixed, because I don’t care that much, honestly (and I
don’t think anyone else does).


Max Reitz (3):
  vdi: Make block_status recurse for fixed images
  vmdk: Make block_status recurse for flat extents
  vpc: Do not return RAW from block_status

 block/vdi.c  | 3 ++-
 block/vmdk.c | 3 +++
 block/vpc.c  | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.21.0


Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by John Snow 4 years, 9 months ago

On 7/25/19 11:55 AM, Max Reitz wrote:
> Hi,
> 
> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
> would only go down to the protocol layer if the format layer returned
> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
> information whether a given range in the image is zero or not.
> Generally, this is because the image is preallocated and thus all ranges
> appear as zeroes.
> 
> However, it only implemented this preallocation detection for qcow2.
> There are more formats that support preallocation, though: vdi, vhdx,
> vmdk, vpc.  (Funny how they all start with “v”.)
> 
> For vdi, vmdk, and vpc, the fix is rather simple, because they really
> have different subformats depending on whether an image is preallocated
> or not.  This makes the check very simple.
> 
> vhdx is more like qcow2, where after the image has been created, it
> isn’t clear whether it’s been preallocated or everything is allocated
> because everything was already written to.  69f47505ee added a heuristic
> to qcow2 to get around this, but I think that’s too much for vhdx.  I
> just left it unfixed, because I don’t care that much, honestly (and I
> don’t think anyone else does).
> 

What's the practical outcome of that, and is the limitation documented
somewhere?

(I'm fine with not fixing it, I just want it documented somehow.)

> 
> Max Reitz (3):
>   vdi: Make block_status recurse for fixed images
>   vmdk: Make block_status recurse for flat extents
>   vpc: Do not return RAW from block_status
> 
>  block/vdi.c  | 3 ++-
>  block/vmdk.c | 3 +++
>  block/vpc.c  | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 

Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by Max Reitz 4 years, 9 months ago
On 12.08.19 20:39, John Snow wrote:
> 
> 
> On 7/25/19 11:55 AM, Max Reitz wrote:
>> Hi,
>>
>> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
>> would only go down to the protocol layer if the format layer returned
>> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
>> information whether a given range in the image is zero or not.
>> Generally, this is because the image is preallocated and thus all ranges
>> appear as zeroes.
>>
>> However, it only implemented this preallocation detection for qcow2.
>> There are more formats that support preallocation, though: vdi, vhdx,
>> vmdk, vpc.  (Funny how they all start with “v”.)
>>
>> For vdi, vmdk, and vpc, the fix is rather simple, because they really
>> have different subformats depending on whether an image is preallocated
>> or not.  This makes the check very simple.
>>
>> vhdx is more like qcow2, where after the image has been created, it
>> isn’t clear whether it’s been preallocated or everything is allocated
>> because everything was already written to.  69f47505ee added a heuristic
>> to qcow2 to get around this, but I think that’s too much for vhdx.  I
>> just left it unfixed, because I don’t care that much, honestly (and I
>> don’t think anyone else does).
>>
> 
> What's the practical outcome of that, and is the limitation documented
> somewhere?

The outcome is that it if you preallocate a vhdx image
(subformat=fixed), you’ll see that all sectors contain data, even if
they may be zero sectors on the filesystem level.

I don’t think it’s user-visible whatsoever.

> (I'm fine with not fixing it, I just want it documented somehow.)

I am really not inclined to start any documentation on the
particularities with which qemu handles vhdx images.

(Especially so considering we don’t even have any documentation on the
qcow2 case.  The stress in my paragraph was “heuristic”.  If you
preallocate a qcow2 image, but then discard enough sectors that the
heuristic thinks you didn’t, you’ll have the same effect.  Or if you
grow a preallocated image without preallocating the new area.)

Max

>>
>> Max Reitz (3):
>>   vdi: Make block_status recurse for fixed images
>>   vmdk: Make block_status recurse for flat extents
>>   vpc: Do not return RAW from block_status
>>
>>  block/vdi.c  | 3 ++-
>>  block/vmdk.c | 3 +++
>>  block/vpc.c  | 2 +-
>>  3 files changed, 6 insertions(+), 2 deletions(-)
>>


Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by John Snow 4 years, 9 months ago

On 8/12/19 3:11 PM, Max Reitz wrote:
> On 12.08.19 20:39, John Snow wrote:
>>
>>
>> On 7/25/19 11:55 AM, Max Reitz wrote:
>>> Hi,
>>>
>>> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
>>> would only go down to the protocol layer if the format layer returned
>>> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
>>> information whether a given range in the image is zero or not.
>>> Generally, this is because the image is preallocated and thus all ranges
>>> appear as zeroes.
>>>
>>> However, it only implemented this preallocation detection for qcow2.
>>> There are more formats that support preallocation, though: vdi, vhdx,
>>> vmdk, vpc.  (Funny how they all start with “v”.)
>>>
>>> For vdi, vmdk, and vpc, the fix is rather simple, because they really
>>> have different subformats depending on whether an image is preallocated
>>> or not.  This makes the check very simple.
>>>
>>> vhdx is more like qcow2, where after the image has been created, it
>>> isn’t clear whether it’s been preallocated or everything is allocated
>>> because everything was already written to.  69f47505ee added a heuristic
>>> to qcow2 to get around this, but I think that’s too much for vhdx.  I
>>> just left it unfixed, because I don’t care that much, honestly (and I
>>> don’t think anyone else does).
>>>
>>
>> What's the practical outcome of that, and is the limitation documented
>> somewhere?
> 
> The outcome is that it if you preallocate a vhdx image
> (subformat=fixed), you’ll see that all sectors contain data, even if
> they may be zero sectors on the filesystem level.
> 
> I don’t think it’s user-visible whatsoever.
> 

But it might mean that doing things with sync=top might over-allocate
data depending on the destination, wouldn't it?

That's not crucial, but it's possibly visible, no?

>> (I'm fine with not fixing it, I just want it documented somehow.)
> 
> I am really not inclined to start any documentation on the
> particularities with which qemu handles vhdx images.
> 
> (Especially so considering we don’t even have any documentation on the
> qcow2 case.  The stress in my paragraph was “heuristic”.  If you
> preallocate a qcow2 image, but then discard enough sectors that the
> heuristic thinks you didn’t, you’ll have the same effect.  Or if you
> grow a preallocated image without preallocating the new area.)
> 
> Max
> 

"But our qcow2 docs are also bad" is the kind of argument I can't
*really* disagree with, but...

(I wish we did have a documentation manual per-format that mentioned
some gotchas and general info about each format, but I can't really ask
you to do that now: I just worry when I see patches like this that the
knowledge or memory that there ever was a quirk will vanish immediately.)

--js

Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by Max Reitz 4 years, 9 months ago
On 12.08.19 23:45, John Snow wrote:
> 
> 
> On 8/12/19 3:11 PM, Max Reitz wrote:
>> On 12.08.19 20:39, John Snow wrote:
>>>
>>>
>>> On 7/25/19 11:55 AM, Max Reitz wrote:
>>>> Hi,
>>>>
>>>> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
>>>> would only go down to the protocol layer if the format layer returned
>>>> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
>>>> information whether a given range in the image is zero or not.
>>>> Generally, this is because the image is preallocated and thus all ranges
>>>> appear as zeroes.
>>>>
>>>> However, it only implemented this preallocation detection for qcow2.
>>>> There are more formats that support preallocation, though: vdi, vhdx,
>>>> vmdk, vpc.  (Funny how they all start with “v”.)
>>>>
>>>> For vdi, vmdk, and vpc, the fix is rather simple, because they really
>>>> have different subformats depending on whether an image is preallocated
>>>> or not.  This makes the check very simple.
>>>>
>>>> vhdx is more like qcow2, where after the image has been created, it
>>>> isn’t clear whether it’s been preallocated or everything is allocated
>>>> because everything was already written to.  69f47505ee added a heuristic
>>>> to qcow2 to get around this, but I think that’s too much for vhdx.  I
>>>> just left it unfixed, because I don’t care that much, honestly (and I
>>>> don’t think anyone else does).
>>>>
>>>
>>> What's the practical outcome of that, and is the limitation documented
>>> somewhere?
>>
>> The outcome is that it if you preallocate a vhdx image
>> (subformat=fixed), you’ll see that all sectors contain data, even if
>> they may be zero sectors on the filesystem level.
>>
>> I don’t think it’s user-visible whatsoever.
>>
> 
> But it might mean that doing things with sync=top might over-allocate
> data depending on the destination, wouldn't it?
> 
> That's not crucial, but it's possibly visible, no?

I don’t think it has anything to do with sync=top because whether a
block is zero on the protocol level has nothing to do with whether it is
allocated on the format level.

It may make a difference for convert which uses block_status to inquire
the zero status.  However, it also does zero-detection, so...

>>> (I'm fine with not fixing it, I just want it documented somehow.)
>>
>> I am really not inclined to start any documentation on the
>> particularities with which qemu handles vhdx images.
>>
>> (Especially so considering we don’t even have any documentation on the
>> qcow2 case.  The stress in my paragraph was “heuristic”.  If you
>> preallocate a qcow2 image, but then discard enough sectors that the
>> heuristic thinks you didn’t, you’ll have the same effect.  Or if you
>> grow a preallocated image without preallocating the new area.)
>>
>> Max
>>
> 
> "But our qcow2 docs are also bad" is the kind of argument I can't
> *really* disagree with, but...

My main argument is that nobody would read the vhdx docs anyway.

Max

> (I wish we did have a documentation manual per-format that mentioned
> some gotchas and general info about each format, but I can't really ask
> you to do that now: I just worry when I see patches like this that the
> knowledge or memory that there ever was a quirk will vanish immediately.)
> 
> --js
> 


Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by John Snow 4 years, 9 months ago

On 8/13/19 10:48 AM, Max Reitz wrote:
> On 12.08.19 23:45, John Snow wrote:
>>
>>
>> On 8/12/19 3:11 PM, Max Reitz wrote:
>>> On 12.08.19 20:39, John Snow wrote:
>>>>
>>>>
>>>> On 7/25/19 11:55 AM, Max Reitz wrote:
>>>>> Hi,
>>>>>
>>>>> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
>>>>> would only go down to the protocol layer if the format layer returned
>>>>> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
>>>>> information whether a given range in the image is zero or not.
>>>>> Generally, this is because the image is preallocated and thus all ranges
>>>>> appear as zeroes.
>>>>>
>>>>> However, it only implemented this preallocation detection for qcow2.
>>>>> There are more formats that support preallocation, though: vdi, vhdx,
>>>>> vmdk, vpc.  (Funny how they all start with “v”.)
>>>>>
>>>>> For vdi, vmdk, and vpc, the fix is rather simple, because they really
>>>>> have different subformats depending on whether an image is preallocated
>>>>> or not.  This makes the check very simple.
>>>>>
>>>>> vhdx is more like qcow2, where after the image has been created, it
>>>>> isn’t clear whether it’s been preallocated or everything is allocated
>>>>> because everything was already written to.  69f47505ee added a heuristic
>>>>> to qcow2 to get around this, but I think that’s too much for vhdx.  I
>>>>> just left it unfixed, because I don’t care that much, honestly (and I
>>>>> don’t think anyone else does).
>>>>>
>>>>
>>>> What's the practical outcome of that, and is the limitation documented
>>>> somewhere?
>>>
>>> The outcome is that it if you preallocate a vhdx image
>>> (subformat=fixed), you’ll see that all sectors contain data, even if
>>> they may be zero sectors on the filesystem level.
>>>
>>> I don’t think it’s user-visible whatsoever.
>>>
>>
>> But it might mean that doing things with sync=top might over-allocate
>> data depending on the destination, wouldn't it?
>>
>> That's not crucial, but it's possibly visible, no?
> 
> I don’t think it has anything to do with sync=top because whether a
> block is zero on the protocol level has nothing to do with whether it is
> allocated on the format level.
> 
> It may make a difference for convert which uses block_status to inquire
> the zero status.  However, it also does zero-detection, so...
> 

Oh, okay then. Probably... fine, but I have a nagging doubt relating to
some of the fallbacks in e.g. qcow2 that tend to inflate zeroes in some
cases (or used to. Maybe it's been fixed since.)

...but I can't point to anything, so it's fine, and I'm just drawing
things out for no reason.

Reviewed-by: John Snow <jsnow@redhat.com>

>>>> (I'm fine with not fixing it, I just want it documented somehow.)
>>>
>>> I am really not inclined to start any documentation on the
>>> particularities with which qemu handles vhdx images.
>>>
>>> (Especially so considering we don’t even have any documentation on the
>>> qcow2 case.  The stress in my paragraph was “heuristic”.  If you
>>> preallocate a qcow2 image, but then discard enough sectors that the
>>> heuristic thinks you didn’t, you’ll have the same effect.  Or if you
>>> grow a preallocated image without preallocating the new area.)
>>>
>>> Max
>>>
>>
>> "But our qcow2 docs are also bad" is the kind of argument I can't
>> *really* disagree with, but...
> 
> My main argument is that nobody would read the vhdx docs anyway.
> 
> Max
> 

That's the sort of thing I'd like to change, but I guess I haven't
really made good on that desire in any way, so what good is that?

--js

Re: [Qemu-devel] [PATCH 0/3] block: Make various formats' block_status recurse again
Posted by Max Reitz 4 years, 9 months ago
On 25.07.19 17:55, Max Reitz wrote:
> Hi,
> 
> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
> would only go down to the protocol layer if the format layer returned
> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
> information whether a given range in the image is zero or not.
> Generally, this is because the image is preallocated and thus all ranges
> appear as zeroes.
> 
> However, it only implemented this preallocation detection for qcow2.
> There are more formats that support preallocation, though: vdi, vhdx,
> vmdk, vpc.  (Funny how they all start with “v”.)
> 
> For vdi, vmdk, and vpc, the fix is rather simple, because they really
> have different subformats depending on whether an image is preallocated
> or not.  This makes the check very simple.
> 
> vhdx is more like qcow2, where after the image has been created, it
> isn’t clear whether it’s been preallocated or everything is allocated
> because everything was already written to.  69f47505ee added a heuristic
> to qcow2 to get around this, but I think that’s too much for vhdx.  I
> just left it unfixed, because I don’t care that much, honestly (and I
> don’t think anyone else does).
> 
> 
> Max Reitz (3):
>   vdi: Make block_status recurse for fixed images
>   vmdk: Make block_status recurse for flat extents
>   vpc: Do not return RAW from block_status
> 
>  block/vdi.c  | 3 ++-
>  block/vmdk.c | 3 +++
>  block/vpc.c  | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)

Thanks for the reviews, applied to my block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max