block/file-posix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the
function creates a new coroutine and then waits that it finishes using
AIO_WAIT_WHILE.
The problem is that this function could also run in a worker thread,
that has a different AioContext from main loop and iothreads, therefore
in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false
and therefore
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
in the else branch will fail, crashing QEMU.
Aside from that, bdrv_getlength() is wrong also conceptually, because
it reads the BDS graph from another thread and is not protected by
any lock.
Replace it with raw_co_getlength, that doesn't create a coroutine and
doesn't read the BDS graph.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
block/file-posix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index d3073a7caa..9a99111f45 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1738,7 +1738,7 @@ static int handle_aiocb_write_zeroes(void *opaque)
#ifdef CONFIG_FALLOCATE
/* Last resort: we are trying to extend the file with zeroed data. This
* can be done via fallocate(fd, 0) */
- len = bdrv_getlength(aiocb->bs);
+ len = raw_co_getlength(aiocb->bs);
if (s->has_fallocate && len >= 0 && aiocb->aio_offset >= len) {
int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
if (ret == 0 || ret != -ENOTSUP) {
--
2.39.1
Am 09.02.2023 um 16:45 hat Emanuele Giuseppe Esposito geschrieben: > When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the > function creates a new coroutine and then waits that it finishes using > AIO_WAIT_WHILE. > The problem is that this function could also run in a worker thread, > that has a different AioContext from main loop and iothreads, therefore > in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false > and therefore > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > in the else branch will fail, crashing QEMU. > > Aside from that, bdrv_getlength() is wrong also conceptually, because > it reads the BDS graph from another thread and is not protected by > any lock. > > Replace it with raw_co_getlength, that doesn't create a coroutine and > doesn't read the BDS graph. > > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Thanks, applied to the block branch. Kevin
Am 09.02.2023 um 16:45 hat Emanuele Giuseppe Esposito geschrieben: > When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the > function creates a new coroutine and then waits that it finishes using > AIO_WAIT_WHILE. > The problem is that this function could also run in a worker thread, > that has a different AioContext from main loop and iothreads, therefore > in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false > and therefore > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > in the else branch will fail, crashing QEMU. > > Aside from that, bdrv_getlength() is wrong also conceptually, because > it reads the BDS graph from another thread and is not protected by > any lock. > > Replace it with raw_co_getlength, that doesn't create a coroutine and > doesn't read the BDS graph. > > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> > --- > block/file-posix.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index d3073a7caa..9a99111f45 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1738,7 +1738,7 @@ static int handle_aiocb_write_zeroes(void *opaque) > #ifdef CONFIG_FALLOCATE > /* Last resort: we are trying to extend the file with zeroed data. This > * can be done via fallocate(fd, 0) */ > - len = bdrv_getlength(aiocb->bs); > + len = raw_co_getlength(aiocb->bs); > if (s->has_fallocate && len >= 0 && aiocb->aio_offset >= len) { > int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes); > if (ret == 0 || ret != -ENOTSUP) { Obviously this relies on the fact that raw_co_getlength() doesn't actually depend on running in coroutine context. Could be done in a separate patch, but I think we should rename it back to raw_getlength() and remove the coroutine_fn annotation again. Seems commit c86422c5549 was a little too eager. Kevin
On 9/2/23 16:45, Emanuele Giuseppe Esposito wrote: > When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the > function creates a new coroutine and then waits that it finishes using > AIO_WAIT_WHILE. > The problem is that this function could also run in a worker thread, > that has a different AioContext from main loop and iothreads, therefore > in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false > and therefore > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > in the else branch will fail, crashing QEMU. > > Aside from that, bdrv_getlength() is wrong also conceptually, because > it reads the BDS graph from another thread and is not protected by > any lock. > > Replace it with raw_co_getlength, that doesn't create a coroutine and > doesn't read the BDS graph. Reported-by: Ninad Palsule <ninad@linux.vnet.ibm.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> > --- > block/file-posix.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index d3073a7caa..9a99111f45 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1738,7 +1738,7 @@ static int handle_aiocb_write_zeroes(void *opaque) > #ifdef CONFIG_FALLOCATE > /* Last resort: we are trying to extend the file with zeroed data. This > * can be done via fallocate(fd, 0) */ > - len = bdrv_getlength(aiocb->bs); > + len = raw_co_getlength(aiocb->bs); > if (s->has_fallocate && len >= 0 && aiocb->aio_offset >= len) { > int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes); > if (ret == 0 || ret != -ENOTSUP) {
© 2016 - 2024 Red Hat, Inc.