[Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img

Maxim Levitsky posted 1 patch 4 years, 9 months ago
Test checkpatch passed
Test s390x passed
Test asan passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190710170349.1548-1-mlevitsk@redhat.com
Maintainers: Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>
There is a newer version of this series
block/crypto.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
[Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Maxim Levitsky 4 years, 9 months ago
preallocation=off and preallocation=metadata
both allocate luks header only, and preallocation=falloc/full
is passed to underlying file, with the given image size.

Note that the actual preallocated size is a bit smaller due
to luks header.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/crypto.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 8237424ae6..74b789d278 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -251,6 +251,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
 static int block_crypto_co_create_generic(BlockDriverState *bs,
                                           int64_t size,
                                           QCryptoBlockCreateOptions *opts,
+                                          PreallocMode prealloc,
                                           Error **errp)
 {
     int ret;
@@ -266,6 +267,13 @@ static int block_crypto_co_create_generic(BlockDriverState *bs,
         goto cleanup;
     }
 
+    if (prealloc != PREALLOC_MODE_OFF) {
+        ret = blk_truncate(blk, size, prealloc, errp);
+        if (ret < 0) {
+                goto cleanup;
+        }
+    }
+
     data = (struct BlockCryptoCreateData) {
         .blk = blk,
         .size = size,
@@ -516,7 +524,7 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp)
     };
 
     ret = block_crypto_co_create_generic(bs, luks_opts->size, &create_opts,
-                                         errp);
+                                         PREALLOC_MODE_OFF, errp);
     if (ret < 0) {
         goto fail;
     }
@@ -534,12 +542,28 @@ static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
     QCryptoBlockCreateOptions *create_opts = NULL;
     BlockDriverState *bs = NULL;
     QDict *cryptoopts;
+    PreallocMode prealloc;
+    char *buf = NULL;
     int64_t size;
     int ret;
+    Error *local_err = NULL;
 
     /* Parse options */
     size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 
+    buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+    prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
+                                   PREALLOC_MODE_OFF, &local_err);
+    g_free(buf);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    if (prealloc == PREALLOC_MODE_METADATA) {
+        prealloc  = PREALLOC_MODE_OFF;
+    }
+
     cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
                                              &block_crypto_create_opts_luks,
                                              true);
@@ -565,7 +589,7 @@ static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
     }
 
     /* Create format layer */
-    ret = block_crypto_co_create_generic(bs, size, create_opts, errp);
+    ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, errp);
     if (ret < 0) {
         goto fail;
     }
-- 
2.17.2


Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Max Reitz 4 years, 9 months ago
On 10.07.19 19:03, Maxim Levitsky wrote:
> preallocation=off and preallocation=metadata
> both allocate luks header only, and preallocation=falloc/full
> is passed to underlying file, with the given image size.
> 
> Note that the actual preallocated size is a bit smaller due
> to luks header.

Couldn’t you just preallocate it after creating the crypto header so
qcrypto_block_get_payload_offset(crypto->block) + size is the actual
file size?

> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)

Hm.  I would expect a preallocated image to read 0.  But if you just
pass this through to the protocol layer, it won’t read 0.

(In fact, I don’t even quite see the point of having LUKS as an own
format still.  It was useful when qcow2 didn’t have LUKS support, but
now it does, so...  I suppose everyone using the LUKS format should
actually be using qcow2 with LUKS?)

Max

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Daniel P. Berrangé 4 years, 9 months ago
On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
> On 10.07.19 19:03, Maxim Levitsky wrote:
> > preallocation=off and preallocation=metadata
> > both allocate luks header only, and preallocation=falloc/full
> > is passed to underlying file, with the given image size.
> > 
> > Note that the actual preallocated size is a bit smaller due
> > to luks header.
> 
> Couldn’t you just preallocate it after creating the crypto header so
> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> file size?

Yeah that would be preferrable. If that's really not possible, we
could likely provide some API to query the expected hreader size for
a given set of creation options. 

> 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/crypto.c | 28 ++++++++++++++++++++++++++--
> >  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> Hm.  I would expect a preallocated image to read 0.  But if you just
> pass this through to the protocol layer, it won’t read 0.

Yes, it will be zeros at the physical layer, but unintelligble
garbage from POV of the virtual disk.

I don't think this is really a problem though - this is what you
get already if you create a LUKS volume on top of a block device
today.

AFAIK, we've not documented that preallocation guarantees future
reads will return zeros. Preallocation simply ensures that all
required space is allocated upfront. We do mention that it might
be achieved by writing zeros to the underlying storage but never
said you'll get zeros back.

IOW I think its at most a docs problem to more clearly explain
that preallocation != guaranteed zeros for reads.

> (In fact, I don’t even quite see the point of having LUKS as an own
> format still.  It was useful when qcow2 didn’t have LUKS support, but
> now it does, so...  I suppose everyone using the LUKS format should
> actually be using qcow2 with LUKS?)

Certainly not. LUKS on raw is going to be very common, not least because
that's directly compatible with what Linux kernel supports. If you don't
want the features of qcow2 like snapshots, it just adds overhead and mgmt
complexity for no gain, especially if dealing with block device backed
storage (iSCSI, RBD).

OpenStack will use cryptsetup when initializing its block storage with
LUKS, then tell QEMU to run with the raw + LUKS driver.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Max Reitz 4 years, 9 months ago
On 11.07.19 11:20, Daniel P. Berrangé wrote:
> On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
>> On 10.07.19 19:03, Maxim Levitsky wrote:
>>> preallocation=off and preallocation=metadata
>>> both allocate luks header only, and preallocation=falloc/full
>>> is passed to underlying file, with the given image size.
>>>
>>> Note that the actual preallocated size is a bit smaller due
>>> to luks header.
>>
>> Couldn’t you just preallocate it after creating the crypto header so
>> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
>> file size?
> 
> Yeah that would be preferrable. If that's really not possible, we
> could likely provide some API to query the expected hreader size for
> a given set of creation options. 
> 
>>
>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>>
>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>>>  1 file changed, 26 insertions(+), 2 deletions(-)
>>
>> Hm.  I would expect a preallocated image to read 0.  But if you just
>> pass this through to the protocol layer, it won’t read 0.
> 
> Yes, it will be zeros at the physical layer, but unintelligble
> garbage from POV of the virtual disk.
> 
> I don't think this is really a problem though - this is what you
> get already if you create a LUKS volume on top of a block device
> today.

Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
driver does not implement, hence it being treated as false.

But if you are preallocating, you have a choice of what you write, and
why not make that zeroes?

> AFAIK, we've not documented that preallocation guarantees future
> reads will return zeros. Preallocation simply ensures that all
> required space is allocated upfront. We do mention that it might
> be achieved by writing zeros to the underlying storage but never
> said you'll get zeros back.

But we have, as I wrote in my second reply.  PreallocMode's
documentation says that at least “full” is writing zeroes, and to say
those zeroes can be anywhere in the stack is cheating, from my POV.

> IOW I think its at most a docs problem to more clearly explain
> that preallocation != guaranteed zeros for reads.
> 
>> (In fact, I don’t even quite see the point of having LUKS as an own
>> format still.  It was useful when qcow2 didn’t have LUKS support, but
>> now it does, so...  I suppose everyone using the LUKS format should
>> actually be using qcow2 with LUKS?)
> 
> Certainly not. LUKS on raw is going to be very common, not least because
> that's directly compatible with what Linux kernel supports. If you don't
> want the features of qcow2 like snapshots, it just adds overhead and mgmt
> complexity for no gain, especially if dealing with block device backed
> storage (iSCSI, RBD).
> 
> OpenStack will use cryptsetup when initializing its block storage with
> LUKS, then tell QEMU to run with the raw + LUKS driver.

I see the compatibility with the Linux kernel, yes (as I wrote in my
second reply), but I’m not sure whether “overhead” really is that big of
a point when using encryption.

Max

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Max Reitz 4 years, 9 months ago
On 11.07.19 14:23, Max Reitz wrote:
> On 11.07.19 11:20, Daniel P. Berrangé wrote:
>> On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:

[...]

>>> Hm.  I would expect a preallocated image to read 0.  But if you just
>>> pass this through to the protocol layer, it won’t read 0.
>>
>> Yes, it will be zeros at the physical layer, but unintelligble
>> garbage from POV of the virtual disk.
>>
>> I don't think this is really a problem though - this is what you
>> get already if you create a LUKS volume on top of a block device
>> today.
> 
> Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
> driver does not implement, hence it being treated as false.
> 
> But if you are preallocating, you have a choice of what you write, and
> why not make that zeroes?
> 
>> AFAIK, we've not documented that preallocation guarantees future
>> reads will return zeros. Preallocation simply ensures that all
>> required space is allocated upfront. We do mention that it might
>> be achieved by writing zeros to the underlying storage but never
>> said you'll get zeros back.
> 
> But we have, as I wrote in my second reply.  PreallocMode's
> documentation says that at least “full” is writing zeroes, and to say
> those zeroes can be anywhere in the stack is cheating, from my POV.

I should add that I don’t mind changing the current documentation too much:

>> IOW I think its at most a docs problem to more clearly explain
>> that preallocation != guaranteed zeros for reads.

If there is a good reason to do that, sure.  But it needs to be done
explicitly, with an accompanying justification.  I don’t like just
ignoring the documentation we have.

(And yes, if something says “this writes zeroes”, I personally will
always interpret that as “it will read as zeroes”.)

Max

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Daniel P. Berrangé 4 years, 9 months ago
On Thu, Jul 11, 2019 at 02:23:55PM +0200, Max Reitz wrote:
> On 11.07.19 11:20, Daniel P. Berrangé wrote:
> > On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
> >> On 10.07.19 19:03, Maxim Levitsky wrote:
> >>> preallocation=off and preallocation=metadata
> >>> both allocate luks header only, and preallocation=falloc/full
> >>> is passed to underlying file, with the given image size.
> >>>
> >>> Note that the actual preallocated size is a bit smaller due
> >>> to luks header.
> >>
> >> Couldn’t you just preallocate it after creating the crypto header so
> >> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> >> file size?
> > 
> > Yeah that would be preferrable. If that's really not possible, we
> > could likely provide some API to query the expected hreader size for
> > a given set of creation options. 
> > 
> >>
> >>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> >>>
> >>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> >>> ---
> >>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
> >>>  1 file changed, 26 insertions(+), 2 deletions(-)
> >>
> >> Hm.  I would expect a preallocated image to read 0.  But if you just
> >> pass this through to the protocol layer, it won’t read 0.
> > 
> > Yes, it will be zeros at the physical layer, but unintelligble
> > garbage from POV of the virtual disk.
> > 
> > I don't think this is really a problem though - this is what you
> > get already if you create a LUKS volume on top of a block device
> > today.
> 
> Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
> driver does not implement, hence it being treated as false.
> 
> But if you are preallocating, you have a choice of what you write, and
> why not make that zeroes?
> 
> > AFAIK, we've not documented that preallocation guarantees future
> > reads will return zeros. Preallocation simply ensures that all
> > required space is allocated upfront. We do mention that it might
> > be achieved by writing zeros to the underlying storage but never
> > said you'll get zeros back.
> 
> But we have, as I wrote in my second reply.  PreallocMode's
> documentation says that at least “full” is writing zeroes, and to say
> those zeroes can be anywhere in the stack is cheating, from my POV.

I guess it depends on your interpretation of the docs. In qemu-img
man page it says

  "falloc" mode preallocates space for image by calling posix_fallocate().
  "full" mode preallocates space for image by writing zeros to underlying
  storage.

To me both those sentances are talking about the lowest level in the
stack, closest to the physical storage medium, though I can understand
if people have other interpretations.

> > IOW I think its at most a docs problem to more clearly explain
> > that preallocation != guaranteed zeros for reads.
> > 
> >> (In fact, I don’t even quite see the point of having LUKS as an own
> >> format still.  It was useful when qcow2 didn’t have LUKS support, but
> >> now it does, so...  I suppose everyone using the LUKS format should
> >> actually be using qcow2 with LUKS?)
> > 
> > Certainly not. LUKS on raw is going to be very common, not least because
> > that's directly compatible with what Linux kernel supports. If you don't
> > want the features of qcow2 like snapshots, it just adds overhead and mgmt
> > complexity for no gain, especially if dealing with block device backed
> > storage (iSCSI, RBD).
> > 
> > OpenStack will use cryptsetup when initializing its block storage with
> > LUKS, then tell QEMU to run with the raw + LUKS driver.
> 
> I see the compatibility with the Linux kernel, yes (as I wrote in my
> second reply), but I’m not sure whether “overhead” really is that big of
> a point when using encryption.

Overhead is not purely about CPU burn. There's non-negligible memory
overhead for qcow2s data tables that doesn't exist at all with raw.
The mgmt complexity & interoperability is the real killer feature
benefit of raw + LUKS vs qcow + LUKS though.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Max Reitz 4 years, 9 months ago
On 10.07.19 23:24, Max Reitz wrote:
> On 10.07.19 19:03, Maxim Levitsky wrote:
>> preallocation=off and preallocation=metadata
>> both allocate luks header only, and preallocation=falloc/full
>> is passed to underlying file, with the given image size.
>>
>> Note that the actual preallocated size is a bit smaller due
>> to luks header.
> 
> Couldn’t you just preallocate it after creating the crypto header so
> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> file size?
> 
>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>
>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>> ---
>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> Hm.  I would expect a preallocated image to read 0.  But if you just
> pass this through to the protocol layer, it won’t read 0.
> 
> (In fact, I don’t even quite see the point of having LUKS as an own
> format still.  It was useful when qcow2 didn’t have LUKS support, but
> now it does, so...  I suppose everyone using the LUKS format should
> actually be using qcow2 with LUKS?)

Kevin just pointed out to me that our LUKS format is compatible to the
actual layout cryptsetup uses.  OK, that is an important use case.

Hm.  Unfortunately, that doesn’t really necessitate preallocation.

Well, whatever.  If it’s simple enough, that shouldn’t stop us from
implementing preallocation anyway.


Now I found that qapi/block-core.json defines PreallocMode’s falloc and
full values as follows:

> # @falloc: like @full preallocation but allocate disk space by
> #          posix_fallocate() rather than writing zeros.
> # @full: preallocate all data by writing zeros to device to ensure disk
> #        space is really available. @full preallocation also sets up
> #        metadata correctly.

So it isn’t just me who expects these to pre-initialize the image to 0.
 Hm, although...  I suppose @falloc technically does not specify whether
the data reads as zeroes.  I kind of find it to be implied, but, well...

Max

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Maxim Levitsky 4 years, 9 months ago
On Wed, 2019-07-10 at 23:52 +0200, Max Reitz wrote:
> On 10.07.19 23:24, Max Reitz wrote:
> > On 10.07.19 19:03, Maxim Levitsky wrote:
> > > preallocation=off and preallocation=metadata
> > > both allocate luks header only, and preallocation=falloc/full
> > > is passed to underlying file, with the given image size.
> > > 
> > > Note that the actual preallocated size is a bit smaller due
> > > to luks header.
> > 
> > Couldn’t you just preallocate it after creating the crypto header so
> > qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> > file size?

I kind of thought of the same thing after I send the patch. I'll see now it I can make it work.


> > 
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  block/crypto.c | 28 ++++++++++++++++++++++++++--
> > >  1 file changed, 26 insertions(+), 2 deletions(-)
> > 
> > Hm.  I would expect a preallocated image to read 0.  But if you just
> > pass this through to the protocol layer, it won’t read 0.
> > 
> > (In fact, I don’t even quite see the point of having LUKS as an own
> > format still.  It was useful when qcow2 didn’t have LUKS support, but
> > now it does, so...  I suppose everyone using the LUKS format should
> > actually be using qcow2 with LUKS?)
> 
> Kevin just pointed out to me that our LUKS format is compatible to the
> actual layout cryptsetup uses.  OK, that is an important use case.
> 
> Hm.  Unfortunately, that doesn’t really necessitate preallocation.
> 
> Well, whatever.  If it’s simple enough, that shouldn’t stop us from
> implementing preallocation anyway.
Exactly. Since I already know the area of qemu-img relatively well, and
this bug is on my backlog, I thought why not to do it.


> 
> 
> Now I found that qapi/block-core.json defines PreallocMode’s falloc and
> full values as follows:
> 
> > # @falloc: like @full preallocation but allocate disk space by
> > #          posix_fallocate() rather than writing zeros.
> > # @full: preallocate all data by writing zeros to device to ensure disk
> > #        space is really available. @full preallocation also sets up
> > #        metadata correctly.
> 
> So it isn’t just me who expects these to pre-initialize the image to 0.
>  Hm, although...  I suppose @falloc technically does not specify whether
> the data reads as zeroes.  I kind of find it to be implied, but, well...

I personally don't really think that zeros are important, but rather the level of allocation.
posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.

On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.

In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
preallocation mode), since underlying storage might 'compress' the zeros. 

In this version I do have a bug that I mentioned, about not preallocation some data at the end of the image, and I will
fix it, so that all image is zeros as expected

Best regards,
	Maxim Levitsky


> 
> Max
> 



Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Max Reitz 4 years, 9 months ago
On 11.07.19 10:39, Maxim Levitsky wrote:
> On Wed, 2019-07-10 at 23:52 +0200, Max Reitz wrote:
>> On 10.07.19 23:24, Max Reitz wrote:
>>> On 10.07.19 19:03, Maxim Levitsky wrote:
>>>> preallocation=off and preallocation=metadata
>>>> both allocate luks header only, and preallocation=falloc/full
>>>> is passed to underlying file, with the given image size.
>>>>
>>>> Note that the actual preallocated size is a bit smaller due
>>>> to luks header.
>>>
>>> Couldn’t you just preallocate it after creating the crypto header so
>>> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
>>> file size?
> 
> I kind of thought of the same thing after I send the patch. I'll see now it I can make it work.
> 
> 
>>>
>>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>>>
>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>> ---
>>>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>>>>  1 file changed, 26 insertions(+), 2 deletions(-)
>>>
>>> Hm.  I would expect a preallocated image to read 0.  But if you just
>>> pass this through to the protocol layer, it won’t read 0.
>>>
>>> (In fact, I don’t even quite see the point of having LUKS as an own
>>> format still.  It was useful when qcow2 didn’t have LUKS support, but
>>> now it does, so...  I suppose everyone using the LUKS format should
>>> actually be using qcow2 with LUKS?)
>>
>> Kevin just pointed out to me that our LUKS format is compatible to the
>> actual layout cryptsetup uses.  OK, that is an important use case.
>>
>> Hm.  Unfortunately, that doesn’t really necessitate preallocation.
>>
>> Well, whatever.  If it’s simple enough, that shouldn’t stop us from
>> implementing preallocation anyway.
> Exactly. Since I already know the area of qemu-img relatively well, and
> this bug is on my backlog, I thought why not to do it.
> 
> 
>>
>>
>> Now I found that qapi/block-core.json defines PreallocMode’s falloc and
>> full values as follows:
>>
>>> # @falloc: like @full preallocation but allocate disk space by
>>> #          posix_fallocate() rather than writing zeros.
>>> # @full: preallocate all data by writing zeros to device to ensure disk
>>> #        space is really available. @full preallocation also sets up
>>> #        metadata correctly.
>>
>> So it isn’t just me who expects these to pre-initialize the image to 0.
>>  Hm, although...  I suppose @falloc technically does not specify whether
>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> 
> I personally don't really think that zeros are important, but rather the level of allocation.
> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
> 
> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
> 
> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
> preallocation mode), since underlying storage might 'compress' the zeros. 

Which is actually an argument why you should just write zeroes on the
LUKS layer, because this will then turn into quasi-random data on the
protocol layer.

Max

> In this version I do have a bug that I mentioned, about not preallocation some data at the end of the image, and I will
> fix it, so that all image is zeros as expected
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
>>
>> Max
>>
> 
> 


Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Eric Blake 4 years, 9 months ago
On 7/11/19 7:24 AM, Max Reitz wrote:

>>> So it isn’t just me who expects these to pre-initialize the image to 0.
>>>  Hm, although...  I suppose @falloc technically does not specify whether
>>> the data reads as zeroes.  I kind of find it to be implied, but, well...
>>
>> I personally don't really think that zeros are important, but rather the level of allocation.
>> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
>>
>> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
>> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
>>
>> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
>> preallocation mode), since underlying storage might 'compress' the zeros. 
> 
> Which is actually an argument why you should just write zeroes on the
> LUKS layer, because this will then turn into quasi-random data on the
> protocol layer.

We want preallocation to be fast (insofar as possible). Writing zeroes
in LUKS is not fast, because it forces random data on the protocol
layer; while writing zeroes on the protocol layer can be fast, even if
it reads back as random on the LUKS layer. If you WANT to guarantee
reading zeroes, that's image scrubbing, not preallocation.  I think this
patch is taking the right approach, of letting the underlying layer
allocate data efficiently (but the burden is then on the underlying
layer to actually allocate data, and not optimize by compressing zeroes
into non-allocated storage).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Kevin Wolf 4 years, 9 months ago
Am 11.07.2019 um 15:50 hat Eric Blake geschrieben:
> On 7/11/19 7:24 AM, Max Reitz wrote:
> 
> >>> So it isn’t just me who expects these to pre-initialize the image to 0.
> >>>  Hm, although...  I suppose @falloc technically does not specify whether
> >>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> >>
> >> I personally don't really think that zeros are important, but rather the level of allocation.
> >> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
> >>
> >> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
> >> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
> >>
> >> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
> >> preallocation mode), since underlying storage might 'compress' the zeros. 
> > 
> > Which is actually an argument why you should just write zeroes on the
> > LUKS layer, because this will then turn into quasi-random data on the
> > protocol layer.
> 
> We want preallocation to be fast (insofar as possible). Writing zeroes
> in LUKS is not fast, because it forces random data on the protocol
> layer; while writing zeroes on the protocol layer can be fast, even if
> it reads back as random on the LUKS layer. If you WANT to guarantee
> reading zeroes, that's image scrubbing, not preallocation.  I think this
> patch is taking the right approach, of letting the underlying layer
> allocate data efficiently (but the burden is then on the underlying
> layer to actually allocate data, and not optimize by compressing zeroes
> into non-allocated storage).

Isn't letting the host efficiently preallocate things what we have
preallocation=falloc for? We implement preallocation=full as explicit
writes to make sure that no shortcuts are taken and things are _really_
preallocated throughout all layers. Not being efficient, but thorough is
almost like the whole point of the option.

So I'm inclined to think that writing zeros on the LUKS layer would be
right for full preallocation.

Kevin
Re: [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img
Posted by Daniel P. Berrangé 4 years, 9 months ago
On Thu, Jul 11, 2019 at 08:50:56AM -0500, Eric Blake wrote:
> On 7/11/19 7:24 AM, Max Reitz wrote:
> 
> >>> So it isn’t just me who expects these to pre-initialize the image to 0.
> >>>  Hm, although...  I suppose @falloc technically does not specify whether
> >>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> >>
> >> I personally don't really think that zeros are important, but rather the level of allocation.
> >> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
> >>
> >> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
> >> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
> >>
> >> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
> >> preallocation mode), since underlying storage might 'compress' the zeros. 
> > 
> > Which is actually an argument why you should just write zeroes on the
> > LUKS layer, because this will then turn into quasi-random data on the
> > protocol layer.
> 
> We want preallocation to be fast (insofar as possible). Writing zeroes
> in LUKS is not fast, because it forces random data on the protocol
> layer; while writing zeroes on the protocol layer can be fast, even if
> it reads back as random on the LUKS layer. If you WANT to guarantee
> reading zeroes, that's image scrubbing, not preallocation.  I think this
> patch is taking the right approach, of letting the underlying layer
> allocate data efficiently (but the burden is then on the underlying
> layer to actually allocate data, and not optimize by compressing zeroes
> into non-allocated storage).

On the topic of scrubbing, it would actually be nice to have a
"secure delete" for QEMU block driver formats that can do some
level of scrubbing in software and/or calling out to hardware support.

Similarly to prealloc a choice of 'metadata' or 'full'. Wwith LUKS
you can do well by just scrubbing the image header (which kills the
master decryption key rendering payload useless).

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[Qemu-devel] [PATCH v2] LUKS: support preallocation in qemu-img
Posted by Maxim Levitsky 4 years, 9 months ago
preallocation=off and preallocation=metadata
both allocate luks header only, and preallocation=falloc/full
is passed to underlying file.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/crypto.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 8237424ae6..cbc291301e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -74,6 +74,7 @@ static ssize_t block_crypto_read_func(QCryptoBlock *block,
 struct BlockCryptoCreateData {
     BlockBackend *blk;
     uint64_t size;
+    PreallocMode prealloc;
 };
 
 
@@ -112,7 +113,7 @@ static ssize_t block_crypto_init_func(QCryptoBlock *block,
      * available to the guest, so we must take account of that
      * which will be used by the crypto header
      */
-    return blk_truncate(data->blk, data->size + headerlen, PREALLOC_MODE_OFF,
+    return blk_truncate(data->blk, data->size + headerlen, data->prealloc,
                         errp);
 }
 
@@ -251,6 +252,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
 static int block_crypto_co_create_generic(BlockDriverState *bs,
                                           int64_t size,
                                           QCryptoBlockCreateOptions *opts,
+                                          PreallocMode prealloc,
                                           Error **errp)
 {
     int ret;
@@ -269,6 +271,7 @@ static int block_crypto_co_create_generic(BlockDriverState *bs,
     data = (struct BlockCryptoCreateData) {
         .blk = blk,
         .size = size,
+        .prealloc = prealloc,
     };
 
     crypto = qcrypto_block_create(opts, NULL,
@@ -516,7 +519,7 @@ block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp)
     };
 
     ret = block_crypto_co_create_generic(bs, luks_opts->size, &create_opts,
-                                         errp);
+                                         PREALLOC_MODE_OFF, errp);
     if (ret < 0) {
         goto fail;
     }
@@ -534,12 +537,28 @@ static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
     QCryptoBlockCreateOptions *create_opts = NULL;
     BlockDriverState *bs = NULL;
     QDict *cryptoopts;
+    PreallocMode prealloc;
+    char *buf = NULL;
     int64_t size;
     int ret;
+    Error *local_err = NULL;
 
     /* Parse options */
     size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 
+    buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+    prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
+                                   PREALLOC_MODE_OFF, &local_err);
+    g_free(buf);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    if (prealloc == PREALLOC_MODE_METADATA) {
+        prealloc  = PREALLOC_MODE_OFF;
+    }
+
     cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
                                              &block_crypto_create_opts_luks,
                                              true);
@@ -565,7 +584,7 @@ static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
     }
 
     /* Create format layer */
-    ret = block_crypto_co_create_generic(bs, size, create_opts, errp);
+    ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, errp);
     if (ret < 0) {
         goto fail;
     }
-- 
2.17.2


Re: [Qemu-devel] [PATCH v2] LUKS: support preallocation in qemu-img
Posted by Max Reitz 4 years, 9 months ago
On 11.07.19 11:11, Maxim Levitsky wrote:
> preallocation=off and preallocation=metadata
> both allocate luks header only, and preallocation=falloc/full
> is passed to underlying file.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/crypto.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)

FWIW, do you see the implementation of block_crypto_co_truncate()?
Like, how it just passes preallocation requests through to the
underlying layer?  How I said it shouldn’t be done?

Yes, that was me, in commit 7ea37c30660.

So, er, yeah.

> diff --git a/block/crypto.c b/block/crypto.c
> index 8237424ae6..cbc291301e 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
[...]

> @@ -534,12 +537,28 @@ static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
>      QCryptoBlockCreateOptions *create_opts = NULL;
>      BlockDriverState *bs = NULL;
>      QDict *cryptoopts;
> +    PreallocMode prealloc;
> +    char *buf = NULL;
>      int64_t size;
>      int ret;
> +    Error *local_err = NULL;
>  
>      /* Parse options */
>      size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
>  
> +    buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
> +    prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
> +                                   PREALLOC_MODE_OFF, &local_err);

Please align such lines to the opening parenthesis.

> +    g_free(buf);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return -EINVAL;
> +    }
> +
> +    if (prealloc == PREALLOC_MODE_METADATA) {
> +        prealloc  = PREALLOC_MODE_OFF;

There is one space too many here.

> +    }
> +

I think you also need to add a @preallocation parameter to
BlockdevCreateOptionsLUKS and handle it in block_crypto_co_create_luks().

Max

>      cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
>                                               &block_crypto_create_opts_luks,
>                                               true);

Re: [Qemu-devel] [PATCH v2] LUKS: support preallocation in qemu-img
Posted by Maxim Levitsky 4 years, 9 months ago
On Thu, 2019-07-11 at 15:43 +0200, Max Reitz wrote:
> On 11.07.19 11:11, Maxim Levitsky wrote:
> > preallocation=off and preallocation=metadata
> > both allocate luks header only, and preallocation=falloc/full
> > is passed to underlying file.
> > 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/crypto.c | 25 ++++++++++++++++++++++---
> >  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> FWIW, do you see the implementation of block_crypto_co_truncate()?
> Like, how it just passes preallocation requests through to the
> underlying layer?  How I said it shouldn’t be done?
> 
> Yes, that was me, in commit 7ea37c30660.
> 
> So, er, yeah.
> 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 8237424ae6..cbc291301e 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> 
> [...]
> 
> > @@ -534,12 +537,28 @@ static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
> >      QCryptoBlockCreateOptions *create_opts = NULL;
> >      BlockDriverState *bs = NULL;
> >      QDict *cryptoopts;
> > +    PreallocMode prealloc;
> > +    char *buf = NULL;
> >      int64_t size;
> >      int ret;
> > +    Error *local_err = NULL;
> >  
> >      /* Parse options */
> >      size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
> >  
> > +    buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
> > +    prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
> > +                                   PREALLOC_MODE_OFF, &local_err);
> 
> Please align such lines to the opening parenthesis.
True - I really need to invest some time to update the checkpatch.pl
in qemu source tree to be up to date, or find a way to use the kernel one,
it is so useful to let it catch these things instead of wasting your time.

> 
> > +    g_free(buf);
> > +    if (local_err) {
> > +        error_propagate(errp, local_err);
> > +        return -EINVAL;
> > +    }
> > +
> > +    if (prealloc == PREALLOC_MODE_METADATA) {
> > +        prealloc  = PREALLOC_MODE_OFF;
> 
> There is one space too many here.
Oops, same thing as above.

> 
> > +    }
> > +
> 
> I think you also need to add a @preallocation parameter to
> BlockdevCreateOptionsLUKS and handle it in block_crypto_co_create_luks().

I was under impression that with new qmp based blockdev-create api, the user
should pretty much do the preallocation itself on the underlying block file,
and then create the luks on it.

However I do see that qcow2 has a preallocation mode there, so I guess I was wrong.
Will do it now.

Best regards,
	Maxim Levitsky