[v5] blkdebug: fix racing condition when iterating on

[PATCH v5 0/6] blkdebug: fix racing condition when iterating on

Posted by Emanuele Giuseppe Esposito 2 years, 10 months ago

When qemu_coroutine_enter is executed in a loop
(even QEMU_FOREACH_SAFE), the new routine can modify the list,
for example removing an element, causing problem when control
is given back to the caller that continues iterating on the same list. 

Patch 1 solves the issue in blkdebug_debug_resume by restarting
the list walk after every coroutine_enter if list has to be fully iterated.
Patches 2,3,4 aim to fix blkdebug_debug_event by gathering
all actions that the rules make in a counter and invoking 
the respective coroutine_yeld only after processing all requests.

Patch 5-6 are somewhat independent of the others, patch 5 removes the need
of new_state field, and patch 6 adds a lock to
protect rules and suspended_reqs; right now everything works because
it's protected by the AioContext lock.
This is a preparation for the current proposal of removing the AioContext
lock and instead using smaller granularity locks to allow multiple
iothread execution in the same block device.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
v5:
* Add comment in patch 1 to explain why we don't need _SAFE in for loop
* Move the state update (s->state = new_state) in patch 5, to maintain
  the same existing effect in all patches

Emanuele Giuseppe Esposito (6):
  blkdebug: refactor removal of a suspended request
  blkdebug: move post-resume handling to resume_req_by_tag
  blkdebug: track all actions
  blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE
  block/blkdebug: remove new_state field and instead use a local
    variable
  blkdebug: protect rules and suspended_reqs with a lock

 block/blkdebug.c | 136 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 92 insertions(+), 44 deletions(-)

-- 
2.31.1

Re: [PATCH v5 0/6] blkdebug: fix racing condition when iterating on

Posted by Max Reitz 2 years, 9 months ago

On 14.06.21 10:29, Emanuele Giuseppe Esposito wrote:
> When qemu_coroutine_enter is executed in a loop
> (even QEMU_FOREACH_SAFE), the new routine can modify the list,
> for example removing an element, causing problem when control
> is given back to the caller that continues iterating on the same list.
>
> Patch 1 solves the issue in blkdebug_debug_resume by restarting
> the list walk after every coroutine_enter if list has to be fully iterated.
> Patches 2,3,4 aim to fix blkdebug_debug_event by gathering
> all actions that the rules make in a counter and invoking
> the respective coroutine_yeld only after processing all requests.
>
> Patch 5-6 are somewhat independent of the others, patch 5 removes the need
> of new_state field, and patch 6 adds a lock to
> protect rules and suspended_reqs; right now everything works because
> it's protected by the AioContext lock.
> This is a preparation for the current proposal of removing the AioContext
> lock and instead using smaller granularity locks to allow multiple
> iothread execution in the same block device.
>
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
> v5:
> * Add comment in patch 1 to explain why we don't need _SAFE in for loop
> * Move the state update (s->state = new_state) in patch 5, to maintain
>    the same existing effect in all patches

I’m not sure whether this actually fixes a user-visible bug…?  The first 
paragraph makes it sound like it, but there is no test, so I’m not sure.

I’m mostly asking because of freeze; but you make it sound like there’s 
a bug, and as this only concerns blkdebug (i.e., a block driver used 
only for testing), I feel like applying this series after soft freeze 
should be fine, so:

Thanks, I’ve applied this series to my block branch:

https://github.com/XanClic/qemu/commits/block

Max