block/backup.c | 2 +- block/io.c | 4 +- block/nbd-client.c | 2 +- block/qcow2-cluster.c | 4 +- block/sheepdog.c | 2 +- block/throttle-groups.c | 2 +- hw/9pfs/9p.c | 2 +- include/qemu/coroutine.h | 84 +++++++++------ tests/test-aio-multithread.c | 250 +++++++++++++++++++++++++++++++++++++++++++ util/qemu-coroutine-lock.c | 247 ++++++++++++++++++++++++++++++++++++++---- util/qemu-coroutine.c | 2 +- util/trace-events | 1 + 12 files changed, 537 insertions(+), 65 deletions(-)
This is yet another tiny bit of the multiqueue work, this time affecting the synchronization infrastructure for coroutines. Currently, coroutines synchronize between the main I/O thread and the dataplane iothread through the AioContext lock. However, for multiqueue a single BDS will be used by multiple iothreads and hence multiple AioContexts. This calls for a different approach to coroutine synchronization, and this series is my attempt. After the previous series, coroutines are already bound to an AioContext while they wait on a CoMutex. Of course currently a CoMutex is generally not used across multiple iothreads, because you have to acquire/release the AioContext around CoMutex critical sections. This series is the missing link between the aio_co_schedule/aio_co_wake infrastructure and making BDS thread-safe: by making CoMutex thread-safe, it removes the need for a "thread-based" mutex around it. This will still need some changes in the formats because, for multiqueue, CoMutexes would need to be used like "real" thread mutexes. Code like this: ... qemu_co_mutex_unlock() ... /* still access shared data, but don't yield */ qemu_coroutine_yield() might be required to use this other pattern: ... /* access shared data, but don't yield */ qemu_co_mutex_unlock() qemu_coroutine_yield() because adding a second AioContext is already introducing concurrency that wasn't there before. Still, even if you have to take concurrency into account, multitasking between coroutines remains non-preemptive. So for example, it is easy to synchronize between qemu_co_mutex_lock's yield and the qemu_coroutine_enter in aio_co_schedule's bottom half. CoMutex puts coroutines to sleep with qemu_coroutine_yield and wake them up with aio_co_wake. I could have wrapped CoMutex's CoQueue with a "regular" thread mutex or spinlock. The resulting code would have looked a lot like RFifoLock (with CoQueue replacing RFifoLock's condition variable). Instead, CoMutex is implemented from scratch and CoQueue is made to depend on a CoMutex, similar to condition variables. Most CoQueues already have a corresponding CoMutex so this is not a big deal; converting the others is left for a future series, but a surprising number of drivers actually need no change. The mutex algorithm comes from OSv; it only needs two to four atomic ops for a lock-unlock pair (two when uncontended) and if necessary we could even take OSv's support for wait morphing (which avoids the thundering herd problem) and add it to CoMutex and CoQueue. Performance of CoMutex is comparable to pthread mutexes. However, you cannot make a direct comparison between CoMutex (fair) and pthread_mutex_t (unfair). For this reason the testcase also measures performance of a quick-and-dirty implementation of a fair mutex, based on MCS locks and futexes. Paolo Paolo Bonzini (6): coroutine-lock: make CoMutex thread-safe coroutine-lock: add limited spinning to CoMutex test-aio-multithread: add performance comparison with thread-based mutexes coroutine-lock: place CoMutex before CoQueue in header coroutine-lock: add mutex argument to CoQueue APIs coroutine-lock: make CoRwlock thread-safe and fair block/backup.c | 2 +- block/io.c | 4 +- block/nbd-client.c | 2 +- block/qcow2-cluster.c | 4 +- block/sheepdog.c | 2 +- block/throttle-groups.c | 2 +- hw/9pfs/9p.c | 2 +- include/qemu/coroutine.h | 84 +++++++++------ tests/test-aio-multithread.c | 250 +++++++++++++++++++++++++++++++++++++++++++ util/qemu-coroutine-lock.c | 247 ++++++++++++++++++++++++++++++++++++++---- util/qemu-coroutine.c | 2 +- util/trace-events | 1 + 12 files changed, 537 insertions(+), 65 deletions(-) -- 2.9.3
On Mon, 02/13 19:12, Paolo Bonzini wrote: > This is yet another tiny bit of the multiqueue work, this time affecting > the synchronization infrastructure for coroutines. Currently, coroutines > synchronize between the main I/O thread and the dataplane iothread through > the AioContext lock. However, for multiqueue a single BDS will be used > by multiple iothreads and hence multiple AioContexts. This calls for > a different approach to coroutine synchronization, and this series is my > attempt. Looks good to me! Reviewed-by: Fam Zheng <famz@redhat.com> Fam
On Mon, Feb 13, 2017 at 07:12:38PM +0100, Paolo Bonzini wrote: > This is yet another tiny bit of the multiqueue work, this time affecting > the synchronization infrastructure for coroutines. Currently, coroutines > synchronize between the main I/O thread and the dataplane iothread through > the AioContext lock. However, for multiqueue a single BDS will be used > by multiple iothreads and hence multiple AioContexts. This calls for > a different approach to coroutine synchronization, and this series is my > attempt. > > After the previous series, coroutines are already bound to an AioContext > while they wait on a CoMutex. Of course currently a CoMutex is generally > not used across multiple iothreads, because you have to acquire/release > the AioContext around CoMutex critical sections. This series is the > missing link between the aio_co_schedule/aio_co_wake infrastructure and > making BDS thread-safe: by making CoMutex thread-safe, it removes > the need for a "thread-based" mutex around it. > > This will still need some changes in the formats because, for multiqueue, > CoMutexes would need to be used like "real" thread mutexes. Code like > this: > > ... > qemu_co_mutex_unlock() > ... /* still access shared data, but don't yield */ > qemu_coroutine_yield() > > might be required to use this other pattern: > > ... /* access shared data, but don't yield */ > qemu_co_mutex_unlock() > qemu_coroutine_yield() > > because adding a second AioContext is already introducing concurrency that > wasn't there before. Still, even if you have to take concurrency into > account, multitasking between coroutines remains non-preemptive. So for > example, it is easy to synchronize between qemu_co_mutex_lock's yield > and the qemu_coroutine_enter in aio_co_schedule's bottom half. > > CoMutex puts coroutines to sleep with qemu_coroutine_yield and wake them > up with aio_co_wake. I could have wrapped CoMutex's CoQueue with > a "regular" thread mutex or spinlock. The resulting code would > have looked a lot like RFifoLock (with CoQueue replacing RFifoLock's > condition variable). Instead, CoMutex is implemented from scratch and > CoQueue is made to depend on a CoMutex, similar to condition variables. > Most CoQueues already have a corresponding CoMutex so this is not a big > deal; converting the others is left for a future series, but a surprising > number of drivers actually need no change. > > The mutex algorithm comes from OSv; it only needs two to four atomic ops > for a lock-unlock pair (two when uncontended) and if necessary > we could even take OSv's support for wait morphing (which avoids the > thundering herd problem) and add it to CoMutex and CoQueue. > > Performance of CoMutex is comparable to pthread mutexes. However, you > cannot make a direct comparison between CoMutex (fair) and pthread_mutex_t > (unfair). For this reason the testcase also measures performance of > a quick-and-dirty implementation of a fair mutex, based on MCS locks > and futexes. > > Paolo > > Paolo Bonzini (6): > coroutine-lock: make CoMutex thread-safe > coroutine-lock: add limited spinning to CoMutex > test-aio-multithread: add performance comparison with thread-based > mutexes > coroutine-lock: place CoMutex before CoQueue in header > coroutine-lock: add mutex argument to CoQueue APIs > coroutine-lock: make CoRwlock thread-safe and fair > > block/backup.c | 2 +- > block/io.c | 4 +- > block/nbd-client.c | 2 +- > block/qcow2-cluster.c | 4 +- > block/sheepdog.c | 2 +- > block/throttle-groups.c | 2 +- > hw/9pfs/9p.c | 2 +- > include/qemu/coroutine.h | 84 +++++++++------ > tests/test-aio-multithread.c | 250 +++++++++++++++++++++++++++++++++++++++++++ > util/qemu-coroutine-lock.c | 247 ++++++++++++++++++++++++++++++++++++++---- > util/qemu-coroutine.c | 2 +- > util/trace-events | 1 + > 12 files changed, 537 insertions(+), 65 deletions(-) > > -- > 2.9.3 > Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
© 2016 - 2024 Red Hat, Inc.