Hot path reqs_lock critical sections are very small; the only large critical
sections happen when a request waits for serialising requests, and these
should never happen in usual circumstances.
We do not want these small critical sections to yield in any case,
which calls for using a spinlock while writing the list. The reqs_lock
is still used to protect the individual requests' CoQueue. For this
purpose, serializing removals against concurrent walks of the request
list can use lock_unlock for efficiency and determinism.
The reqs_lock is also used to protect the flush generation counts, but
that's unrelated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block.c | 1 +
block/io.c | 25 ++++++++++++++++++++-----
include/block/block_int.h | 11 ++++++++---
3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/block.c b/block.c
index 3b2ed29..7ba6afe 100644
--- a/block.c
+++ b/block.c
@@ -234,6 +234,7 @@ BlockDriverState *bdrv_new(void)
QLIST_INIT(&bs->op_blockers[i]);
}
notifier_with_return_list_init(&bs->before_write_notifiers);
+ qemu_spin_init(&bs->reqs_list_write_lock);
qemu_co_mutex_init(&bs->reqs_lock);
bs->refcnt = 1;
bs->aio_context = qemu_get_aio_context();
diff --git a/block/io.c b/block/io.c
index 7af9d47..476807d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -374,14 +374,29 @@ void bdrv_drain_all(void)
*/
static void tracked_request_end(BdrvTrackedRequest *req)
{
+ BlockDriverState *bs = req->bs;
+
if (req->serialising) {
- atomic_dec(&req->bs->serialising_in_flight);
+ atomic_dec(&bs->serialising_in_flight);
}
- qemu_co_mutex_lock(&req->bs->reqs_lock);
+ /* Note that there can be a concurrent visit while we remove the list,
+ * so we need to...
+ */
+ qemu_spin_lock(&bs->reqs_list_write_lock);
QLIST_REMOVE(req, list);
+ qemu_spin_unlock(&bs->reqs_list_write_lock);
+
+ /* ... wait for it to end before we leave. qemu_co_mutex_lock_unlock
+ * avoids cacheline bouncing in the common case of no concurrent
+ * reader.
+ */
+ qemu_co_mutex_lock_unlock(&bs->reqs_lock);
+
+ /* Now no coroutine can add itself to the wait queue, so it is
+ * safe to call qemu_co_queue_restart_all outside the reqs_lock.
+ */
qemu_co_queue_restart_all(&req->wait_queue);
- qemu_co_mutex_unlock(&req->bs->reqs_lock);
}
/**
@@ -406,9 +421,9 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
qemu_co_queue_init(&req->wait_queue);
- qemu_co_mutex_lock(&bs->reqs_lock);
+ qemu_spin_lock(&bs->reqs_list_write_lock);
QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
- qemu_co_mutex_unlock(&bs->reqs_lock);
+ qemu_spin_unlock(&bs->reqs_list_write_lock);
}
static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 42b49f5..b298de8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -78,9 +78,10 @@ typedef struct BdrvTrackedRequest {
QLIST_ENTRY(BdrvTrackedRequest) list;
Coroutine *co; /* owner, used for deadlock detection */
- CoQueue wait_queue; /* coroutines blocked on this request */
-
struct BdrvTrackedRequest *waiting_for;
+
+ /* Protected by BlockDriverState's reqs_lock. */
+ CoQueue wait_queue; /* coroutines blocked on this request */
} BdrvTrackedRequest;
struct BlockDriver {
@@ -626,11 +627,15 @@ struct BlockDriverState {
int quiesce_counter;
unsigned int write_gen; /* Current data generation */
- /* Protected by reqs_lock. */
+ /* Writes are protected by reqs_list_write_lock. Reads take
+ * reqs_lock so that removals can easily synchronize with walks.
+ */
QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
CoQueue flush_queue; /* Serializing flush queue */
bool active_flush_req; /* Flush request in flight? */
unsigned int flushed_gen; /* Flushed write generation */
+
+ QemuSpin reqs_list_write_lock;
CoMutex reqs_lock;
};
--
2.9.3
On Thu, Apr 20, 2017 at 02:00:55PM +0200, Paolo Bonzini wrote: > Hot path reqs_lock critical sections are very small; the only large critical > sections happen when a request waits for serialising requests, and these > should never happen in usual circumstances. > > We do not want these small critical sections to yield in any case, > which calls for using a spinlock while writing the list. Is this patch purely an optimization? I'm hesitant about using spinlocks in userspace. There are cases where the thread is descheduled that are beyond our control. Nested virt will probably make things worse. People have been optimizing and trying paravirt approaches to kernel spinlocks for these reasons for years. Isn't a futex-based lock efficient enough? That way we don't hog the CPU when there is contention. Also, there are no performance results included in this patch that justify the spinlock.
On 04/05/2017 16:59, Stefan Hajnoczi wrote: > On Thu, Apr 20, 2017 at 02:00:55PM +0200, Paolo Bonzini wrote: >> Hot path reqs_lock critical sections are very small; the only large critical >> sections happen when a request waits for serialising requests, and these >> should never happen in usual circumstances. >> >> We do not want these small critical sections to yield in any case, >> which calls for using a spinlock while writing the list. > > Is this patch purely an optimization? Yes, it is, and pretty much a no-op until we have true multiqueue. But I expect it to have a significant effect for multiqueue. > I'm hesitant about using spinlocks in userspace. There are cases where > the thread is descheduled that are beyond our control. Nested virt will > probably make things worse. People have been optimizing and trying > paravirt approaches to kernel spinlocks for these reasons for years. This is true, but here we're talking about a 5-10 instruction window for preemption; it matches the usage of spinlocks in other parts of QEMU. The long critical sections, which only happen with combination with copy-on-read or RMW (large logical block sizes on the host), take the CoMutex. On one hand it's true that the more you nest, the more things get worse. On the other hand there can only ever be contention with multiqueue, and the multiqueue scenarios are going to use pinning. > Isn't a futex-based lock efficient enough? That way we don't hog the > CPU when there is contention. It is efficient when there is no contention, but when there is, the latency goes up by several orders of magnitude. Paolo > Also, there are no performance results included in this patch that > justify the spinlock. >
On Thu, May 04, 2017 at 06:06:39PM +0200, Paolo Bonzini wrote: > On 04/05/2017 16:59, Stefan Hajnoczi wrote: > > On Thu, Apr 20, 2017 at 02:00:55PM +0200, Paolo Bonzini wrote: > >> Hot path reqs_lock critical sections are very small; the only large critical > >> sections happen when a request waits for serialising requests, and these > >> should never happen in usual circumstances. > >> > >> We do not want these small critical sections to yield in any case, > >> which calls for using a spinlock while writing the list. > > > > Is this patch purely an optimization? > > Yes, it is, and pretty much a no-op until we have true multiqueue. But > I expect it to have a significant effect for multiqueue. > > > I'm hesitant about using spinlocks in userspace. There are cases where > > the thread is descheduled that are beyond our control. Nested virt will > > probably make things worse. People have been optimizing and trying > > paravirt approaches to kernel spinlocks for these reasons for years. > > This is true, but here we're talking about a 5-10 instruction window for > preemption; it matches the usage of spinlocks in other parts of QEMU. Only util/qht.c uses spinlocks, it's not a widely used primitive. > The long critical sections, which only happen with combination with > copy-on-read or RMW (large logical block sizes on the host), take the > CoMutex. > > On one hand it's true that the more you nest, the more things get worse. > On the other hand there can only ever be contention with multiqueue, > and the multiqueue scenarios are going to use pinning. > > > Isn't a futex-based lock efficient enough? That way we don't hog the > > CPU when there is contention. > > It is efficient when there is no contention, but when there is, the > latency goes up by several orders of magnitude. Doesn't glibc spin for a while before waiting on the futex? i.e. the best of both worlds. Stefan
On 05/05/2017 12:25, Stefan Hajnoczi wrote: > On Thu, May 04, 2017 at 06:06:39PM +0200, Paolo Bonzini wrote: >> On 04/05/2017 16:59, Stefan Hajnoczi wrote: >>> On Thu, Apr 20, 2017 at 02:00:55PM +0200, Paolo Bonzini wrote: >>>> Hot path reqs_lock critical sections are very small; the only large critical >>>> sections happen when a request waits for serialising requests, and these >>>> should never happen in usual circumstances. >>>> >>>> We do not want these small critical sections to yield in any case, >>>> which calls for using a spinlock while writing the list. >>> >>> Is this patch purely an optimization? >> >> Yes, it is, and pretty much a no-op until we have true multiqueue. But >> I expect it to have a significant effect for multiqueue. >> >>> I'm hesitant about using spinlocks in userspace. There are cases where >>> the thread is descheduled that are beyond our control. Nested virt will >>> probably make things worse. People have been optimizing and trying >>> paravirt approaches to kernel spinlocks for these reasons for years. >> >> This is true, but here we're talking about a 5-10 instruction window for >> preemption; it matches the usage of spinlocks in other parts of QEMU. > > Only util/qht.c uses spinlocks, it's not a widely used primitive. Right, but the idea is the same---very short, heavy and performance-critical cases use spinlocks. (util/qht.c is used heavily in TCG mode). >> It is efficient when there is no contention, but when there is, the >> latency goes up by several orders of magnitude. > > Doesn't glibc spin for a while before waiting on the futex? i.e. the > best of both worlds. You have to specify that manually with pthread_mutexattr_settype(..., PTHRED_MUTEX_ADAPTIVE_NP). It is not enabled by default because IIUC the adaptive one doesn't support pthread_mutex_timedlock. Paolo
On Fri, May 05, 2017 at 12:45:38PM +0200, Paolo Bonzini wrote: > > > On 05/05/2017 12:25, Stefan Hajnoczi wrote: > > On Thu, May 04, 2017 at 06:06:39PM +0200, Paolo Bonzini wrote: > >> On 04/05/2017 16:59, Stefan Hajnoczi wrote: > >>> On Thu, Apr 20, 2017 at 02:00:55PM +0200, Paolo Bonzini wrote: > >>>> Hot path reqs_lock critical sections are very small; the only large critical > >>>> sections happen when a request waits for serialising requests, and these > >>>> should never happen in usual circumstances. > >>>> > >>>> We do not want these small critical sections to yield in any case, > >>>> which calls for using a spinlock while writing the list. > >>> > >>> Is this patch purely an optimization? > >> > >> Yes, it is, and pretty much a no-op until we have true multiqueue. But > >> I expect it to have a significant effect for multiqueue. > >> > >>> I'm hesitant about using spinlocks in userspace. There are cases where > >>> the thread is descheduled that are beyond our control. Nested virt will > >>> probably make things worse. People have been optimizing and trying > >>> paravirt approaches to kernel spinlocks for these reasons for years. > >> > >> This is true, but here we're talking about a 5-10 instruction window for > >> preemption; it matches the usage of spinlocks in other parts of QEMU. > > > > Only util/qht.c uses spinlocks, it's not a widely used primitive. > > Right, but the idea is the same---very short, heavy and > performance-critical cases use spinlocks. (util/qht.c is used heavily > in TCG mode). > > >> It is efficient when there is no contention, but when there is, the > >> latency goes up by several orders of magnitude. > > > > Doesn't glibc spin for a while before waiting on the futex? i.e. the > > best of both worlds. > > You have to specify that manually with pthread_mutexattr_settype(..., > PTHRED_MUTEX_ADAPTIVE_NP). It is not enabled by default because IIUC > the adaptive one doesn't support pthread_mutex_timedlock. If you want to use spinlocks in QEMU please document strict rules that ensure they will be used safely. For example, I'd be comfortable with: Spinlock regions may only call functions in the same source file or functions explicitly documented as spinlock-safe. Only memory accesses and computation may be performed; do not make system calls or invoke library functions. Stefan
On 08/05/2017 18:21, Stefan Hajnoczi wrote: > If you want to use spinlocks in QEMU please document strict rules that > ensure they will be used safely. For example, I'd be comfortable with: > > Spinlock regions may only call functions in the same source file or > functions explicitly documented as spinlock-safe. Only memory accesses > and computation may be performed; do not make system calls or invoke > library functions. Sounds good. But for now I'll just drop these patches in the interest of simplicity. Paolo
© 2016 - 2026 Red Hat, Inc.