nbd/client: Deal with unaligned size from server

[Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Eric Blake 5 years ago

When a server advertises an unaligned size but no block sizes, the
code was rounding up to a sector-aligned size (a known limitation of
bdrv_getlength(); maybe 4.1 will fix it to be byte-accurate), then
assuming a request_alignment of 512 (the recommendation of the NBD
spec for maximum portability).  However, this means that qemu will
actually attempt to access the padding bytes of the trailing partial
sector, without realizing that it is risky.

An easy demonstration, using nbdkit as the server:
$ nbdkit -fv random size=1023
$ qemu-io -r -f raw -c 'r -v 0 1023' nbd://localhost:10809
read failed: Invalid argument

because the client rounded the request up to 1024 bytes, which nbdkit
then rejected as beyond the advertised size of 1023.

Note that qemu as the server refuses to send an unaligned size, as it
has already rounded the unaligned image up to sector size, and then
happily resizes the image on access (at least when serving a POSIX
file over NBD). But since third-party servers may decide to kill the
connection when we access out of bounds, it's better to just ignore
the unaligned tail than it is to risk problems. We can undo this hack
later once the block layer quits rounding sizes inappropriately.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
---

This is the minimal patch to avoid our client behaving out-of-spec,
but I don't like that it makes the tail of the server's file
unreadable. A better solution of auditing the block layer to use a
proper byte length everywhere instead of rounding up may be 4.1
material, but is certainly not appropriate for 4.0-rc2. I could also
teach the NBD code to compare every request from the block length
against client.info.size, and manually handle the tail itself, but
that feels like a lot of pain (for something the block layer should be
able to do generically, and where any NBD-specific tail-handling code
just slows down the common case and would be ripped out again once the
block layer quits rounding up). I'm willing to include this with my
other NBD patches slated for -rc2 as part of my suite of improved
third-party interoperability, but only if we agree that the truncation
is appropriate.

Note that nbdkit includes '--filter=truncate round-up=512' as a way to
expose the unaligned tail without making qemu trigger out-of-bounds
operations: reads of the tail now succeed with contents of 0; writes
fail with ENOSPC unless the contents requested to be written into the
tail are all 0s.  So not fixing the bug for 4.0, and telling people to
use nbdkit's truncate filter instead, is also a viable option.

 block/nbd.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2e72df528ac..a2cd365f646 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -463,7 +463,17 @@ static int64_t nbd_getlength(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;

-    return s->client.info.size;
+    /*
+     * HACK: Round down to sector alignment. qemu as server always
+     * advertises a sector-aligned image, so we only ever see
+     * unaligned sizes with third-party servers. The block layer
+     * defaults to rounding up, but that can result in us attempting
+     * to read beyond EOF from the remote server; better is to
+     * forcefully round down here and just treat the last few bytes
+     * from the server as unreadable.  This can all go away once the
+     * block layer uses actual byte lengths instead of rounding up.
+     */
+    return QEMU_ALIGN_DOWN(s->client.info.size, BDRV_SECTOR_SIZE);
 }

 static void nbd_detach_aio_context(BlockDriverState *bs)
-- 
2.20.1

Re: [Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Vladimir Sementsov-Ogievskiy 5 years ago

28.03.2019 5:02, Eric Blake wrote:
> When a server advertises an unaligned size but no block sizes, the
> code was rounding up to a sector-aligned size (a known limitation of
> bdrv_getlength(); maybe 4.1 will fix it to be byte-accurate), then
> assuming a request_alignment of 512 (the recommendation of the NBD
> spec for maximum portability).  However, this means that qemu will
> actually attempt to access the padding bytes of the trailing partial
> sector, without realizing that it is risky.
> 
> An easy demonstration, using nbdkit as the server:
> $ nbdkit -fv random size=1023
> $ qemu-io -r -f raw -c 'r -v 0 1023' nbd://localhost:10809
> read failed: Invalid argument
> 
> because the client rounded the request up to 1024 bytes, which nbdkit
> then rejected as beyond the advertised size of 1023.
> 
> Note that qemu as the server refuses to send an unaligned size, as it
> has already rounded the unaligned image up to sector size, and then
> happily resizes the image on access (at least when serving a POSIX
> file over NBD). But since third-party servers may decide to kill the
> connection when we access out of bounds, it's better to just ignore
> the unaligned tail than it is to risk problems. We can undo this hack
> later once the block layer quits rounding sizes inappropriately.
> 
> Reported-by: Richard W.M. Jones <rjones@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> This is the minimal patch to avoid our client behaving out-of-spec,
> but I don't like that it makes the tail of the server's file
> unreadable. A better solution of auditing the block layer to use a
> proper byte length everywhere instead of rounding up may be 4.1
> material, but is certainly not appropriate for 4.0-rc2. I could also
> teach the NBD code to compare every request from the block length
> against client.info.size, and manually handle the tail itself, but
> that feels like a lot of pain (for something the block layer should be
> able to do generically, and where any NBD-specific tail-handling code
> just slows down the common case and would be ripped out again once the
> block layer quits rounding up). I'm willing to include this with my
> other NBD patches slated for -rc2 as part of my suite of improved
> third-party interoperability, but only if we agree that the truncation
> is appropriate.
> 
> Note that nbdkit includes '--filter=truncate round-up=512' as a way to
> expose the unaligned tail without making qemu trigger out-of-bounds
> operations: reads of the tail now succeed with contents of 0; writes
> fail with ENOSPC unless the contents requested to be written into the
> tail are all 0s.  So not fixing the bug for 4.0, and telling people to
> use nbdkit's truncate filter instead, is also a viable option.
> 
>   block/nbd.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2e72df528ac..a2cd365f646 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -463,7 +463,17 @@ static int64_t nbd_getlength(BlockDriverState *bs)
>   {
>       BDRVNBDState *s = bs->opaque;
> 
> -    return s->client.info.size;
> +    /*
> +     * HACK: Round down to sector alignment. qemu as server always
> +     * advertises a sector-aligned image, so we only ever see
> +     * unaligned sizes with third-party servers. The block layer
> +     * defaults to rounding up, but that can result in us attempting
> +     * to read beyond EOF from the remote server; better is to
> +     * forcefully round down here and just treat the last few bytes
> +     * from the server as unreadable.  This can all go away once the
> +     * block layer uses actual byte lengths instead of rounding up.
> +     */
> +    return QEMU_ALIGN_DOWN(s->client.info.size, BDRV_SECTOR_SIZE);
>   }
> 
>   static void nbd_detach_aio_context(BlockDriverState *bs)
> 


What is wrong with correct EIO from server? I'm not sure that noisily ignoring
file tail is better.

Assume qemu-img convert from such nbd export: with your patch it will just ignore
last bytes, and user will think that all is OK. I'd prefer not doing it.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Kevin Wolf 5 years ago

Am 28.03.2019 um 03:02 hat Eric Blake geschrieben:
> When a server advertises an unaligned size but no block sizes, the
> code was rounding up to a sector-aligned size (a known limitation of
> bdrv_getlength(); maybe 4.1 will fix it to be byte-accurate), then
> assuming a request_alignment of 512 (the recommendation of the NBD
> spec for maximum portability).  However, this means that qemu will
> actually attempt to access the padding bytes of the trailing partial
> sector, without realizing that it is risky.
> 
> An easy demonstration, using nbdkit as the server:
> $ nbdkit -fv random size=1023
> $ qemu-io -r -f raw -c 'r -v 0 1023' nbd://localhost:10809
> read failed: Invalid argument
> 
> because the client rounded the request up to 1024 bytes, which nbdkit
> then rejected as beyond the advertised size of 1023.
> 
> Note that qemu as the server refuses to send an unaligned size, as it
> has already rounded the unaligned image up to sector size, and then
> happily resizes the image on access (at least when serving a POSIX
> file over NBD). But since third-party servers may decide to kill the
> connection when we access out of bounds, it's better to just ignore
> the unaligned tail than it is to risk problems. We can undo this hack
> later once the block layer quits rounding sizes inappropriately.
> 
> Reported-by: Richard W.M. Jones <rjones@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>

I think making the behaviour inconsistent across different block
drivers, so that some round up and some round down, is a bad idea.
Even without the inconsistency, rounding down is already a bad idea
because it means data loss when you copy the disk.

This leaves two options:

1. Leave things as they are, size gets rounded up. With some servers,
   we might get the data we need, but padded with zeros or garbage. This
   should be fine. With other servers, you might get an I/O error and
   the connection might be dropped when you actually access the last
   block. Nasty, but at least it doesn't fail silently, and you can
   still access the rest of the image.

2. Just return an error in .bdrv_open when the NBD device has an
   unaligned size. No surprising I/O errors or dropped connections later
   on, but you can't access the image at all.

Kevin

Re: [Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Eric Blake 5 years ago

On 3/28/19 6:40 AM, Kevin Wolf wrote:

>> Note that qemu as the server refuses to send an unaligned size, as it
>> has already rounded the unaligned image up to sector size, and then
>> happily resizes the image on access (at least when serving a POSIX
>> file over NBD). But since third-party servers may decide to kill the
>> connection when we access out of bounds, it's better to just ignore
>> the unaligned tail than it is to risk problems. We can undo this hack
>> later once the block layer quits rounding sizes inappropriately.
>>
>> Reported-by: Richard W.M. Jones <rjones@redhat.com>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> I think making the behaviour inconsistent across different block
> drivers, so that some round up and some round down, is a bad idea.
> Even without the inconsistency, rounding down is already a bad idea
> because it means data loss when you copy the disk.

Concur, truncation is safe, but surprising when it is not done
consistently (and we're too late to switch away from our current
behavior of rounding up).

> 
> This leaves two options:
> 
> 1. Leave things as they are, size gets rounded up. With some servers,
>    we might get the data we need, but padded with zeros or garbage. This
>    should be fine. With other servers, you might get an I/O error and
>    the connection might be dropped when you actually access the last
>    block. Nasty, but at least it doesn't fail silently, and you can
>    still access the rest of the image.

Well, you can only access the rest of the image if qemu doesn't try to
do a read larger than the last sector. 'qemu-img convert' has the nasty
habit of trying to read as much as possible, and then dying on the EIO
caused by the partial last sector even though it COULD have split the
request into the head (which is still readable) and the tail (< 512
bytes), for maximum copying.

> 
> 2. Just return an error in .bdrv_open when the NBD device has an
>    unaligned size. No surprising I/O errors or dropped connections later
>    on, but you can't access the image at all.

Tempting, but harsh.

Option 3:

Teach the nbd code to special-case past-EOF read and block-status to do
the right thing, but not worry about write/write-zero/trim (those will
fail, but that's less important for qemu-img convert).

I'll post a v2 patch along those lines.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Richard W.M. Jones 5 years ago

On Thu, Mar 28, 2019 at 07:23:33AM -0500, Eric Blake wrote:
> On 3/28/19 6:40 AM, Kevin Wolf wrote:
> 
> >> Note that qemu as the server refuses to send an unaligned size, as it
> >> has already rounded the unaligned image up to sector size, and then
> >> happily resizes the image on access (at least when serving a POSIX
> >> file over NBD). But since third-party servers may decide to kill the
> >> connection when we access out of bounds, it's better to just ignore
> >> the unaligned tail than it is to risk problems. We can undo this hack
> >> later once the block layer quits rounding sizes inappropriately.
> >>
> >> Reported-by: Richard W.M. Jones <rjones@redhat.com>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> > 
> > I think making the behaviour inconsistent across different block
> > drivers, so that some round up and some round down, is a bad idea.
> > Even without the inconsistency, rounding down is already a bad idea
> > because it means data loss when you copy the disk.
> 
> Concur, truncation is safe, but surprising when it is not done
> consistently (and we're too late to switch away from our current
> behavior of rounding up).
> 
> > 
> > This leaves two options:
> > 
> > 1. Leave things as they are, size gets rounded up. With some servers,
> >    we might get the data we need, but padded with zeros or garbage. This
> >    should be fine. With other servers, you might get an I/O error and
> >    the connection might be dropped when you actually access the last
> >    block. Nasty, but at least it doesn't fail silently, and you can
> >    still access the rest of the image.
> 
> Well, you can only access the rest of the image if qemu doesn't try to
> do a read larger than the last sector. 'qemu-img convert' has the nasty
> habit of trying to read as much as possible, and then dying on the EIO
> caused by the partial last sector even though it COULD have split the
> request into the head (which is still readable) and the tail (< 512
> bytes), for maximum copying.

Yes I just noticed this again ...

$ nbdkit -U - -f memory size=513 --run 'qemu-img convert $nbd /var/tmp/out'
nbdkit: memory.2: error: invalid request: NBD_CMD_READ: offset and count are out of range: offset=0 count=1024
qemu-img: error while reading sector 0: Invalid argument

Annoying but can be fixed by the blocksize or truncate filters.

Rich.

> > 
> > 2. Just return an error in .bdrv_open when the NBD device has an
> >    unaligned size. No surprising I/O errors or dropped connections later
> >    on, but you can't access the image at all.
> 
> Tempting, but harsh.
> 
> Option 3:
> 
> Teach the nbd code to special-case past-EOF read and block-status to do
> the right thing, but not worry about write/write-zero/trim (those will
> fail, but that's less important for qemu-img convert).
> 
> I'll post a v2 patch along those lines.
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 




-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Kevin Wolf 5 years ago

Am 28.03.2019 um 13:23 hat Eric Blake geschrieben:
> On 3/28/19 6:40 AM, Kevin Wolf wrote:
> 
> >> Note that qemu as the server refuses to send an unaligned size, as it
> >> has already rounded the unaligned image up to sector size, and then
> >> happily resizes the image on access (at least when serving a POSIX
> >> file over NBD). But since third-party servers may decide to kill the
> >> connection when we access out of bounds, it's better to just ignore
> >> the unaligned tail than it is to risk problems. We can undo this hack
> >> later once the block layer quits rounding sizes inappropriately.
> >>
> >> Reported-by: Richard W.M. Jones <rjones@redhat.com>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> > 
> > I think making the behaviour inconsistent across different block
> > drivers, so that some round up and some round down, is a bad idea.
> > Even without the inconsistency, rounding down is already a bad idea
> > because it means data loss when you copy the disk.
> 
> Concur, truncation is safe, but surprising when it is not done
> consistently (and we're too late to switch away from our current
> behavior of rounding up).
> 
> > 
> > This leaves two options:
> > 
> > 1. Leave things as they are, size gets rounded up. With some servers,
> >    we might get the data we need, but padded with zeros or garbage. This
> >    should be fine. With other servers, you might get an I/O error and
> >    the connection might be dropped when you actually access the last
> >    block. Nasty, but at least it doesn't fail silently, and you can
> >    still access the rest of the image.
> 
> Well, you can only access the rest of the image if qemu doesn't try to
> do a read larger than the last sector. 'qemu-img convert' has the nasty
> habit of trying to read as much as possible, and then dying on the EIO
> caused by the partial last sector even though it COULD have split the
> request into the head (which is still readable) and the tail (< 512
> bytes), for maximum copying.
> 
> > 
> > 2. Just return an error in .bdrv_open when the NBD device has an
> >    unaligned size. No surprising I/O errors or dropped connections later
> >    on, but you can't access the image at all.
> 
> Tempting, but harsh.

I honestly think it would be okay. If your image isn't even aligned on
512 bytes, it's probably not a Network Block Device, but just a random
Network File. That's not what the protocol was meant for.

> Option 3:
> 
> Teach the nbd code to special-case past-EOF read and block-status to do
> the right thing, but not worry about write/write-zero/trim (those will
> fail, but that's less important for qemu-img convert).
> 
> I'll post a v2 patch along those lines.

This would be similar to what file-posix does for short reads (just adds
zero padding), so it should be reasonable to do the same in NBD. This is
probably the nicest option.

Kevin

Re: [Qemu-devel] [PATCH for-4.0?] nbd/client: Deal with unaligned size from server

Posted by Eric Blake 5 years ago

On 3/28/19 7:36 AM, Kevin Wolf wrote:

>>>
>>> I think making the behaviour inconsistent across different block
>>> drivers, so that some round up and some round down, is a bad idea.
>>> Even without the inconsistency, rounding down is already a bad idea
>>> because it means data loss when you copy the disk.
>>
>> Concur, truncation is safe, but surprising when it is not done
>> consistently (and we're too late to switch away from our current
>> behavior of rounding up).
>>

>> Option 3:
>>
>> Teach the nbd code to special-case past-EOF read and block-status to do
>> the right thing, but not worry about write/write-zero/trim (those will
>> fail, but that's less important for qemu-img convert).
>>
>> I'll post a v2 patch along those lines.
> 
> This would be similar to what file-posix does for short reads (just adds
> zero padding), so it should be reasonable to do the same in NBD. This is
> probably the nicest option.

Now posted: Message-Id: <20190329042750.14704-1-eblake@redhat.com>
Subject: [PATCH for-4.0 v3 0/6] NBD server alignment improvement

I think there's still a bug in the blkdebug driver for letting unaligned
block status leak back to the block layer, but I'm out of time to
investigate that today, and blkdebug is not used in production images,
so it's less urgent.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org