[PATCH v2 3/5] nbd: make nbd_export_close_all() synchronous

Vladimir Sementsov-Ogievskiy posted 5 patches 5 years, 7 months ago
Maintainers: Max Reitz <mreitz@redhat.com>, Eric Blake <eblake@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
[PATCH v2 3/5] nbd: make nbd_export_close_all() synchronous
Posted by Vladimir Sementsov-Ogievskiy 5 years, 7 months ago
Consider nbd_export_close_all(). The call-stack looks like this:
 nbd_export_close_all() -> nbd_export_close -> call client_close() for
each client.

client_close() doesn't guarantee that client is closed: nbd_trip()
keeps reference to it. So, nbd_export_close_all() just reduce
reference counter on export and removes it from the list, but doesn't
guarantee that nbd_trip() finished neither export actually removed.

Let's wait for all exports actually removed.

Without this fix, the following crash is possible:

- export bitmap through internal Qemu NBD server
- connect a client
- shutdown Qemu

On shutdown nbd_export_close_all is called, but it actually don't wait
for nbd_trip() to finish and to release its references. So, export is
not release, and exported bitmap remains busy, and on try to remove the
bitmap (which is part of bdrv_close()) the assertion fails:

bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

v2: rewritten, try to wait exports directly.

Note: I'm not sure in my understanding of AIO_WAIT_WHILE and related things
and really hope for review.


 nbd/server.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 20754e9ebc..9d64b00f4b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -102,6 +102,8 @@ struct NBDExport {
 };
 
 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
+static QTAILQ_HEAD(, NBDExport) closed_exports =
+        QTAILQ_HEAD_INITIALIZER(closed_exports);
 
 /* NBDExportMetaContexts represents a list of contexts to be exported,
  * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
@@ -1655,6 +1657,7 @@ void nbd_export_close(NBDExport *exp)
         g_free(exp->name);
         exp->name = NULL;
         QTAILQ_REMOVE(&exports, exp, next);
+        QTAILQ_INSERT_TAIL(&closed_exports, exp, next);
     }
     g_free(exp->description);
     exp->description = NULL;
@@ -1717,7 +1720,9 @@ void nbd_export_put(NBDExport *exp)
             g_free(exp->export_bitmap_context);
         }
 
+        QTAILQ_REMOVE(&closed_exports, exp, next);
         g_free(exp);
+        aio_wait_kick();
     }
 }
 
@@ -1737,6 +1742,9 @@ void nbd_export_close_all(void)
         nbd_export_close(exp);
         aio_context_release(aio_context);
     }
+
+    AIO_WAIT_WHILE(NULL, !(QTAILQ_EMPTY(&exports) &&
+                           QTAILQ_EMPTY(&closed_exports)));
 }
 
 static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
-- 
2.18.0


Re: [PATCH v2 3/5] nbd: make nbd_export_close_all() synchronous
Posted by Eric Blake 5 years, 7 months ago
On 7/1/20 5:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Consider nbd_export_close_all(). The call-stack looks like this:
>   nbd_export_close_all() -> nbd_export_close -> call client_close() for
> each client.
> 
> client_close() doesn't guarantee that client is closed: nbd_trip()
> keeps reference to it. So, nbd_export_close_all() just reduce
> reference counter on export and removes it from the list, but doesn't
> guarantee that nbd_trip() finished neither export actually removed.
> 
> Let's wait for all exports actually removed.
> 
> Without this fix, the following crash is possible:
> 
> - export bitmap through internal Qemu NBD server
> - connect a client
> - shutdown Qemu
> 
> On shutdown nbd_export_close_all is called, but it actually don't wait
> for nbd_trip() to finish and to release its references. So, export is
> not release, and exported bitmap remains busy, and on try to remove the
> bitmap (which is part of bdrv_close()) the assertion fails:
> 
> bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> v2: rewritten, try to wait exports directly.
> 
> Note: I'm not sure in my understanding of AIO_WAIT_WHILE and related things
> and really hope for review.

I'm also a bit weak on whether the AIO_WAIT_WHILE is being used 
correctly.  But the idea behind the patch makes sense to me, and since 
it is a bug fix, it will be okay to apply this for -rc1 or even -rc2 if 
needed (I'm not including it in my pull request today, however).

> 
> 
>   nbd/server.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 20754e9ebc..9d64b00f4b 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -102,6 +102,8 @@ struct NBDExport {
>   };
>   
>   static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
> +static QTAILQ_HEAD(, NBDExport) closed_exports =
> +        QTAILQ_HEAD_INITIALIZER(closed_exports);
>   
>   /* NBDExportMetaContexts represents a list of contexts to be exported,
>    * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
> @@ -1655,6 +1657,7 @@ void nbd_export_close(NBDExport *exp)
>           g_free(exp->name);
>           exp->name = NULL;
>           QTAILQ_REMOVE(&exports, exp, next);
> +        QTAILQ_INSERT_TAIL(&closed_exports, exp, next);
>       }
>       g_free(exp->description);
>       exp->description = NULL;
> @@ -1717,7 +1720,9 @@ void nbd_export_put(NBDExport *exp)
>               g_free(exp->export_bitmap_context);
>           }
>   
> +        QTAILQ_REMOVE(&closed_exports, exp, next);
>           g_free(exp);
> +        aio_wait_kick();
>       }
>   }
>   
> @@ -1737,6 +1742,9 @@ void nbd_export_close_all(void)
>           nbd_export_close(exp);
>           aio_context_release(aio_context);
>       }
> +
> +    AIO_WAIT_WHILE(NULL, !(QTAILQ_EMPTY(&exports) &&
> +                           QTAILQ_EMPTY(&closed_exports)));
>   }
>   
>   static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
> 

weak:
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org