softmmu/dma-helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
assert(dbs->acb) is meant to check the return value of io_func per
documented in commit 6bee44ea34 ("dma: the passed io_func does not
return NULL"). However, there is a chance that after calling
aio_context_release(dbs->ctx); the dma_blk_cb function is called before
the assertion and dbs->acb is set to NULL again at line 121. Thus when
we run assert at line 181 it will fail.
softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed.
Reported-by: Francisco Londono <f.londono@samsung.com>
Signed-off-by: Tong Zhang <t.zhang2@samsung.com>
---
softmmu/dma-helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 7820fec54c..cb81017928 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret)
aio_context_acquire(dbs->ctx);
dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
dma_blk_cb, dbs, dbs->io_func_opaque);
- aio_context_release(dbs->ctx);
assert(dbs->acb);
+ aio_context_release(dbs->ctx);
}
static void dma_aio_cancel(BlockAIOCB *acb)
--
2.25.1
On 27.04.22 22:51, Tong Zhang wrote: > assert(dbs->acb) is meant to check the return value of io_func per > documented in commit 6bee44ea34 ("dma: the passed io_func does not > return NULL"). However, there is a chance that after calling > aio_context_release(dbs->ctx); the dma_blk_cb function is called before > the assertion and dbs->acb is set to NULL again at line 121. Thus when > we run assert at line 181 it will fail. > > softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. > > Reported-by: Francisco Londono <f.londono@samsung.com> > Signed-off-by: Tong Zhang <t.zhang2@samsung.com> > --- > softmmu/dma-helpers.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c > index 7820fec54c..cb81017928 100644 > --- a/softmmu/dma-helpers.c > +++ b/softmmu/dma-helpers.c > @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) > aio_context_acquire(dbs->ctx); > dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, > dma_blk_cb, dbs, dbs->io_func_opaque); > - aio_context_release(dbs->ctx); > assert(dbs->acb); > + aio_context_release(dbs->ctx); > } > > static void dma_aio_cancel(BlockAIOCB *acb) I'm fairly new to that code, but I wonder what prevents dma_blk_cb() to run after you reshuffled the code? After all, acquire/release is only around the dbs->io_func() call, so I don't immediately see how it interacts with re-entrance? -- Thanks, David / dhildenb
Hi David, On Mon, May 30, 2022 at 9:19 AM David Hildenbrand <david@redhat.com> wrote: > > On 27.04.22 22:51, Tong Zhang wrote: > > assert(dbs->acb) is meant to check the return value of io_func per > > documented in commit 6bee44ea34 ("dma: the passed io_func does not > > return NULL"). However, there is a chance that after calling > > aio_context_release(dbs->ctx); the dma_blk_cb function is called before > > the assertion and dbs->acb is set to NULL again at line 121. Thus when > > we run assert at line 181 it will fail. > > > > softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. > > > > Reported-by: Francisco Londono <f.londono@samsung.com> > > Signed-off-by: Tong Zhang <t.zhang2@samsung.com> > > --- > > softmmu/dma-helpers.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c > > index 7820fec54c..cb81017928 100644 > > --- a/softmmu/dma-helpers.c > > +++ b/softmmu/dma-helpers.c > > @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) > > aio_context_acquire(dbs->ctx); > > dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, > > dma_blk_cb, dbs, dbs->io_func_opaque); > > - aio_context_release(dbs->ctx); > > assert(dbs->acb); > > + aio_context_release(dbs->ctx); > > } > > > > static void dma_aio_cancel(BlockAIOCB *acb) > > I'm fairly new to that code, but I wonder what prevents dma_blk_cb() to > run after you reshuffled the code? > IMO if the assert is to test whether io_func returns a non-NULL value shouldn't it be immediately after calling io_func. Also... as suggested by commit 6bee44ea346aed24e12d525daf10542d695508db > dma: the passed io_func does not return NULL Thanks, - Tong > After all, acquire/release is only around the dbs->io_func() call, so I > don't immediately see how it interacts with re-entrance? > > -- > Thanks, > > David / dhildenb >
On 01.06.22 02:20, Tong Zhang wrote: > Hi David, > > On Mon, May 30, 2022 at 9:19 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 27.04.22 22:51, Tong Zhang wrote: >>> assert(dbs->acb) is meant to check the return value of io_func per >>> documented in commit 6bee44ea34 ("dma: the passed io_func does not >>> return NULL"). However, there is a chance that after calling >>> aio_context_release(dbs->ctx); the dma_blk_cb function is called before >>> the assertion and dbs->acb is set to NULL again at line 121. Thus when >>> we run assert at line 181 it will fail. >>> >>> softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. >>> >>> Reported-by: Francisco Londono <f.londono@samsung.com> >>> Signed-off-by: Tong Zhang <t.zhang2@samsung.com> >>> --- >>> softmmu/dma-helpers.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c >>> index 7820fec54c..cb81017928 100644 >>> --- a/softmmu/dma-helpers.c >>> +++ b/softmmu/dma-helpers.c >>> @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) >>> aio_context_acquire(dbs->ctx); >>> dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, >>> dma_blk_cb, dbs, dbs->io_func_opaque); >>> - aio_context_release(dbs->ctx); >>> assert(dbs->acb); >>> + aio_context_release(dbs->ctx); >>> } >>> >>> static void dma_aio_cancel(BlockAIOCB *acb) >> >> I'm fairly new to that code, but I wonder what prevents dma_blk_cb() to >> run after you reshuffled the code? >> > > IMO if the assert is to test whether io_func returns a non-NULL value > shouldn't it be immediately after calling io_func. > Also... as suggested by commit 6bee44ea346aed24e12d525daf10542d695508db > > dma: the passed io_func does not return NULL Yes, but I just don't see how it would fix the assertion you document in the patch description. The locking change to fix the assertion doesn't make any sense to me, and most probably I am missing something important :) -- Thanks, David / dhildenb
On Wed, Jun 01, 2022 at 10:00:50AM +0200, David Hildenbrand wrote: > On 01.06.22 02:20, Tong Zhang wrote: > > Hi David, > > > > On Mon, May 30, 2022 at 9:19 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 27.04.22 22:51, Tong Zhang wrote: > >>> assert(dbs->acb) is meant to check the return value of io_func per > >>> documented in commit 6bee44ea34 ("dma: the passed io_func does not > >>> return NULL"). However, there is a chance that after calling > >>> aio_context_release(dbs->ctx); the dma_blk_cb function is called before > >>> the assertion and dbs->acb is set to NULL again at line 121. Thus when > >>> we run assert at line 181 it will fail. > >>> > >>> softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. > >>> > >>> Reported-by: Francisco Londono <f.londono@samsung.com> > >>> Signed-off-by: Tong Zhang <t.zhang2@samsung.com> > >>> --- > >>> softmmu/dma-helpers.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c > >>> index 7820fec54c..cb81017928 100644 > >>> --- a/softmmu/dma-helpers.c > >>> +++ b/softmmu/dma-helpers.c > >>> @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) > >>> aio_context_acquire(dbs->ctx); > >>> dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, > >>> dma_blk_cb, dbs, dbs->io_func_opaque); > >>> - aio_context_release(dbs->ctx); > >>> assert(dbs->acb); > >>> + aio_context_release(dbs->ctx); > >>> } > >>> > >>> static void dma_aio_cancel(BlockAIOCB *acb) > >> > >> I'm fairly new to that code, but I wonder what prevents dma_blk_cb() to > >> run after you reshuffled the code? > >> > > > > IMO if the assert is to test whether io_func returns a non-NULL value > > shouldn't it be immediately after calling io_func. > > Also... as suggested by commit 6bee44ea346aed24e12d525daf10542d695508db > > > dma: the passed io_func does not return NULL > > Yes, but I just don't see how it would fix the assertion you document in > the patch description. The locking change to fix the assertion doesn't > make any sense to me, and most probably I am missing something important :) The other thread will invoke dma_blk_cb(), which modifies dbs->acb, when it can take the lock. Therefore dbs->acb may contain a value different from our io_func()'s return value by the time we perform the assertion check (that's the race). This patch makes sense to me. Can you rephrase your concern? Stefan
On 01.06.22 15:24, Stefan Hajnoczi wrote: > On Wed, Jun 01, 2022 at 10:00:50AM +0200, David Hildenbrand wrote: >> On 01.06.22 02:20, Tong Zhang wrote: >>> Hi David, >>> >>> On Mon, May 30, 2022 at 9:19 AM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 27.04.22 22:51, Tong Zhang wrote: >>>>> assert(dbs->acb) is meant to check the return value of io_func per >>>>> documented in commit 6bee44ea34 ("dma: the passed io_func does not >>>>> return NULL"). However, there is a chance that after calling >>>>> aio_context_release(dbs->ctx); the dma_blk_cb function is called before >>>>> the assertion and dbs->acb is set to NULL again at line 121. Thus when >>>>> we run assert at line 181 it will fail. >>>>> >>>>> softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. >>>>> >>>>> Reported-by: Francisco Londono <f.londono@samsung.com> >>>>> Signed-off-by: Tong Zhang <t.zhang2@samsung.com> >>>>> --- >>>>> softmmu/dma-helpers.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c >>>>> index 7820fec54c..cb81017928 100644 >>>>> --- a/softmmu/dma-helpers.c >>>>> +++ b/softmmu/dma-helpers.c >>>>> @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) >>>>> aio_context_acquire(dbs->ctx); >>>>> dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, >>>>> dma_blk_cb, dbs, dbs->io_func_opaque); >>>>> - aio_context_release(dbs->ctx); >>>>> assert(dbs->acb); >>>>> + aio_context_release(dbs->ctx); >>>>> } >>>>> >>>>> static void dma_aio_cancel(BlockAIOCB *acb) >>>> >>>> I'm fairly new to that code, but I wonder what prevents dma_blk_cb() to >>>> run after you reshuffled the code? >>>> >>> >>> IMO if the assert is to test whether io_func returns a non-NULL value >>> shouldn't it be immediately after calling io_func. >>> Also... as suggested by commit 6bee44ea346aed24e12d525daf10542d695508db >>> > dma: the passed io_func does not return NULL >> >> Yes, but I just don't see how it would fix the assertion you document in >> the patch description. The locking change to fix the assertion doesn't >> make any sense to me, and most probably I am missing something important :) > > The other thread will invoke dma_blk_cb(), which modifies dbs->acb, when > it can take the lock. Therefore dbs->acb may contain a value different > from our io_func()'s return value by the time we perform the assertion > check (that's the race). > > This patch makes sense to me. Can you rephrase your concern? The locking is around dbs->io_func(). aio_context_acquire(dbs->ctx); dbs->acb = dbs->io_func() aio_context_release(dbs->ctx); So where exactly would the lock that's now still held stop someone from modifying dbs->acb = NULL at the beginning of the function, which seems to be not protected by that lock? Maybe I'm missing some locking magic due to the lock being a recursive lock. -- Thanks, David / dhildenb
On Wed, 1 Jun 2022 at 14:29, David Hildenbrand <david@redhat.com> wrote: > > On 01.06.22 15:24, Stefan Hajnoczi wrote: > > On Wed, Jun 01, 2022 at 10:00:50AM +0200, David Hildenbrand wrote: > >> On 01.06.22 02:20, Tong Zhang wrote: > >>> Hi David, > >>> > >>> On Mon, May 30, 2022 at 9:19 AM David Hildenbrand <david@redhat.com> wrote: > >>>> > >>>> On 27.04.22 22:51, Tong Zhang wrote: > >>>>> assert(dbs->acb) is meant to check the return value of io_func per > >>>>> documented in commit 6bee44ea34 ("dma: the passed io_func does not > >>>>> return NULL"). However, there is a chance that after calling > >>>>> aio_context_release(dbs->ctx); the dma_blk_cb function is called before > >>>>> the assertion and dbs->acb is set to NULL again at line 121. Thus when > >>>>> we run assert at line 181 it will fail. > >>>>> > >>>>> softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. > >>>>> > >>>>> Reported-by: Francisco Londono <f.londono@samsung.com> > >>>>> Signed-off-by: Tong Zhang <t.zhang2@samsung.com> > >>>>> --- > >>>>> softmmu/dma-helpers.c | 2 +- > >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c > >>>>> index 7820fec54c..cb81017928 100644 > >>>>> --- a/softmmu/dma-helpers.c > >>>>> +++ b/softmmu/dma-helpers.c > >>>>> @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) > >>>>> aio_context_acquire(dbs->ctx); > >>>>> dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, > >>>>> dma_blk_cb, dbs, dbs->io_func_opaque); > >>>>> - aio_context_release(dbs->ctx); > >>>>> assert(dbs->acb); > >>>>> + aio_context_release(dbs->ctx); > >>>>> } > >>>>> > >>>>> static void dma_aio_cancel(BlockAIOCB *acb) > >>>> > >>>> I'm fairly new to that code, but I wonder what prevents dma_blk_cb() to > >>>> run after you reshuffled the code? > >>>> > >>> > >>> IMO if the assert is to test whether io_func returns a non-NULL value > >>> shouldn't it be immediately after calling io_func. > >>> Also... as suggested by commit 6bee44ea346aed24e12d525daf10542d695508db > >>> > dma: the passed io_func does not return NULL > >> > >> Yes, but I just don't see how it would fix the assertion you document in > >> the patch description. The locking change to fix the assertion doesn't > >> make any sense to me, and most probably I am missing something important :) > > > > The other thread will invoke dma_blk_cb(), which modifies dbs->acb, when > > it can take the lock. Therefore dbs->acb may contain a value different > > from our io_func()'s return value by the time we perform the assertion > > check (that's the race). > > > > This patch makes sense to me. Can you rephrase your concern? > > The locking is around dbs->io_func(). > > aio_context_acquire(dbs->ctx); > dbs->acb = dbs->io_func() > aio_context_release(dbs->ctx); > > > So where exactly would the lock that's now still held stop someone from > modifying dbs->acb = NULL at the beginning of the function, which seems > to be not protected by that lock? > > Maybe I'm missing some locking magic due to the lock being a recursive lock. Tong Zhang: Can you share a backtrace of all threads when the assertion failure occurs? Stefan
Hi Stefan, On Wed, Jun 1, 2022 at 6:56 AM Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > > This patch makes sense to me. Can you rephrase your concern? > > > > The locking is around dbs->io_func(). > > > > aio_context_acquire(dbs->ctx); > > dbs->acb = dbs->io_func() > > aio_context_release(dbs->ctx); > > > > > > So where exactly would the lock that's now still held stop someone from > > modifying dbs->acb = NULL at the beginning of the function, which seems > > to be not protected by that lock? > > > > Maybe I'm missing some locking magic due to the lock being a recursive lock. > > Tong Zhang: Can you share a backtrace of all threads when the > assertion failure occurs? > Sorry I couldn't get the trace now -- but I can tell that we have some internal code uses this dma related code and will grab dbs->ctx lock in another thread and could overwrite dbs->acb. From my understanding, one of the reasons that the lock is required here is to protect dbs->acb, we could not reliably test io_func()'s return value after releasing the lock here. Since this code affects our internal code base and I did not reproduce on master branch, feel free to ignore it. - Tong > Stefan
On Thu, Jun 2, 2022, 02:04 Tong Zhang <ztong0001@gmail.com> wrote: > > Hi Stefan, > > On Wed, Jun 1, 2022 at 6:56 AM Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > > > > This patch makes sense to me. Can you rephrase your concern? > > > > > > The locking is around dbs->io_func(). > > > > > > aio_context_acquire(dbs->ctx); > > > dbs->acb = dbs->io_func() > > > aio_context_release(dbs->ctx); > > > > > > > > > So where exactly would the lock that's now still held stop someone from > > > modifying dbs->acb = NULL at the beginning of the function, which seems > > > to be not protected by that lock? > > > > > > Maybe I'm missing some locking magic due to the lock being a recursive lock. > > > > Tong Zhang: Can you share a backtrace of all threads when the > > assertion failure occurs? > > > Sorry I couldn't get the trace now -- but I can tell that we have some > internal code uses > this dma related code and will grab dbs->ctx lock in another thread > and could overwrite dbs->acb. > > From my understanding, one of the reasons that the lock is required > here is to protect dbs->acb, > we could not reliably test io_func()'s return value after releasing > the lock here. > > Since this code affects our internal code base and I did not reproduce > on master branch, > feel free to ignore it. If this patch is unnecessary on qemu.git/master it raises the question whether aio_context_acquire/release() should be removed from dma_blk_cb(). It was added by: commit 1919631e6b5562e474690853eca3c35610201e16 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Mon Feb 13 14:52:31 2017 +0100 block: explicitly acquire aiocontext in bottom halves that need it Paolo: Is dma_blk_cb() called without the AioContext lock (by virtio-scsi or any code path with IOThreads)? David pointed out that if that's the case then the dbs->acb is accessed without the lock at the start of dma_blk_cb(). Stefan
+Emanuele / Alexander / Stefan On 27/4/22 22:51, Tong Zhang wrote: > assert(dbs->acb) is meant to check the return value of io_func per > documented in commit 6bee44ea34 ("dma: the passed io_func does not > return NULL"). However, there is a chance that after calling > aio_context_release(dbs->ctx); the dma_blk_cb function is called before > the assertion and dbs->acb is set to NULL again at line 121. Thus when > we run assert at line 181 it will fail. > > softmmu/dma-helpers.c:181: dma_blk_cb: Assertion `dbs->acb' failed. > > Reported-by: Francisco Londono <f.londono@samsung.com> > Signed-off-by: Tong Zhang <t.zhang2@samsung.com> > --- > softmmu/dma-helpers.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c > index 7820fec54c..cb81017928 100644 > --- a/softmmu/dma-helpers.c > +++ b/softmmu/dma-helpers.c > @@ -177,8 +177,8 @@ static void dma_blk_cb(void *opaque, int ret) > aio_context_acquire(dbs->ctx); > dbs->acb = dbs->io_func(dbs->offset, &dbs->iov, > dma_blk_cb, dbs, dbs->io_func_opaque); > - aio_context_release(dbs->ctx); > assert(dbs->acb); > + aio_context_release(dbs->ctx); > } > > static void dma_aio_cancel(BlockAIOCB *acb)
© 2016 - 2025 Red Hat, Inc.