NBD reconnect logic considers the error code from the functions that
read NBD messages to tell if reconnect should be attempted or not: it is
attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT
state (see nbd_channel_error). This error code is propagated from the
primitives like nbd_read.
The problem, however, is that nbd_read itself turns every error into -1
rather than -EIO. As a result, if the NBD server happens to die while
sending the message, the client in QEMU receives less data than it
expects, considers it as a fatal error, and wouldn't attempt
reestablishing the connection.
Fix it by turning every negative return from qio_channel_read_all into
-EIO returned from nbd_read. Apparently that was the original behavior,
but got broken later. Also adjust nbd_readXX to follow.
Fixes: e6798f06a6 ("nbd: generalize usage of nbd_read")
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
include/block/nbd.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4a52a43ef5..5f34d23bb0 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -364,7 +364,7 @@ static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
if (desc) {
error_prepend(errp, "Failed to read %s: ", desc);
}
- return -1;
+ return ret;
}
return 0;
@@ -375,8 +375,9 @@ static inline int nbd_read##bits(QIOChannel *ioc, \
uint##bits##_t *val, \
const char *desc, Error **errp) \
{ \
- if (nbd_read(ioc, val, sizeof(*val), desc, errp) < 0) { \
- return -1; \
+ int ret = nbd_read(ioc, val, sizeof(*val), desc, errp); \
+ if (ret < 0) { \
+ return ret; \
} \
*val = be##bits##_to_cpu(*val); \
return 0; \
--
2.29.2
28.01.2021 23:14, Roman Kagan wrote:
> NBD reconnect logic considers the error code from the functions that
> read NBD messages to tell if reconnect should be attempted or not: it is
> attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT
> state (see nbd_channel_error). This error code is propagated from the
> primitives like nbd_read.
>
> The problem, however, is that nbd_read itself turns every error into -1
> rather than -EIO. As a result, if the NBD server happens to die while
> sending the message, the client in QEMU receives less data than it
> expects, considers it as a fatal error, and wouldn't attempt
> reestablishing the connection.
>
> Fix it by turning every negative return from qio_channel_read_all into
> -EIO returned from nbd_read. Apparently that was the original behavior,
> but got broken later. Also adjust nbd_readXX to follow.
>
> Fixes: e6798f06a6 ("nbd: generalize usage of nbd_read")
> Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Really looks like a bug in e6798f06a6: it changes error code from -EIO to -1 without any reasoning.
> ---
> include/block/nbd.h | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 4a52a43ef5..5f34d23bb0 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -364,7 +364,7 @@ static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
> if (desc) {
> error_prepend(errp, "Failed to read %s: ", desc);
> }
> - return -1;
> + return ret;
> }
>
> return 0;
> @@ -375,8 +375,9 @@ static inline int nbd_read##bits(QIOChannel *ioc, \
> uint##bits##_t *val, \
> const char *desc, Error **errp) \
> { \
> - if (nbd_read(ioc, val, sizeof(*val), desc, errp) < 0) { \
> - return -1; \
> + int ret = nbd_read(ioc, val, sizeof(*val), desc, errp); \
> + if (ret < 0) { \
> + return ret; \
> } \
> *val = be##bits##_to_cpu(*val); \
> return 0; \
>
--
Best regards,
Vladimir
© 2016 - 2026 Red Hat, Inc.