[PATCH 1/7] block/nvme: poll queues without q->lock

Stefan Hajnoczi posted 7 patches 5 years, 5 months ago
[PATCH 1/7] block/nvme: poll queues without q->lock
Posted by Stefan Hajnoczi 5 years, 5 months ago
A lot of CPU time is spent simply locking/unlocking q->lock during
polling. Check for completion outside the lock to make q->lock disappear
from the profile.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nvme.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index eb2f54dd9d..7eb4512666 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
 
     for (i = 0; i < s->nr_queues; i++) {
         NVMeQueuePair *q = s->queues[i];
+        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
+        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
+
+        /*
+         * q->lock isn't needed for checking completion because
+         * nvme_process_completion() only runs in the event loop thread and
+         * cannot race with itself.
+         */
+        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
+            continue;
+        }
+
         qemu_mutex_lock(&q->lock);
         while (nvme_process_completion(s, q)) {
             /* Keep polling */
-- 
2.25.3

Re: [PATCH 1/7] block/nvme: poll queues without q->lock
Posted by Sergio Lopez 5 years, 5 months ago
On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> A lot of CPU time is spent simply locking/unlocking q->lock during
> polling. Check for completion outside the lock to make q->lock disappear
> from the profile.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nvme.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index eb2f54dd9d..7eb4512666 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
>  
>      for (i = 0; i < s->nr_queues; i++) {
>          NVMeQueuePair *q = s->queues[i];
> +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> +
> +        /*
> +         * q->lock isn't needed for checking completion because
> +         * nvme_process_completion() only runs in the event loop thread and
> +         * cannot race with itself.
> +         */
> +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> +            continue;
> +        }
> +

IIUC, this is introducing an early check of the phase bit to determine
if there is something new in the queue.

I'm fine with this optimization, but I have the feeling that the
comment doesn't properly describe it.

Sergio.

>          qemu_mutex_lock(&q->lock);
>          while (nvme_process_completion(s, q)) {
>              /* Keep polling */
> -- 
> 2.25.3
> 
Re: [PATCH 1/7] block/nvme: poll queues without q->lock
Posted by Stefan Hajnoczi 5 years, 5 months ago
On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > A lot of CPU time is spent simply locking/unlocking q->lock during
> > polling. Check for completion outside the lock to make q->lock disappear
> > from the profile.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/nvme.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index eb2f54dd9d..7eb4512666 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> >  
> >      for (i = 0; i < s->nr_queues; i++) {
> >          NVMeQueuePair *q = s->queues[i];
> > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > +
> > +        /*
> > +         * q->lock isn't needed for checking completion because
> > +         * nvme_process_completion() only runs in the event loop thread and
> > +         * cannot race with itself.
> > +         */
> > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > +            continue;
> > +        }
> > +
> 
> IIUC, this is introducing an early check of the phase bit to determine
> if there is something new in the queue.
> 
> I'm fine with this optimization, but I have the feeling that the
> comment doesn't properly describe it.

I'm not sure I understand. The comment explains why it's safe not to
take q->lock. Normally it would be taken. Without the comment readers
could be confused why we ignore the locking rules here.

As for documenting the cqe->status expression itself, I didn't think of
explaining it since it's part of the theory of operation of this device.
Any polling driver will do this, there's nothing QEMU-specific or
unusual going on here.

Would you like me to expand the comment explaining that NVMe polling
consists of checking the phase bit of the latest cqe to check for
readiness?

Or maybe I misunderstood? :)

Stefan
Re: [PATCH 1/7] block/nvme: poll queues without q->lock
Posted by Sergio Lopez 5 years, 5 months ago
On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote:
> On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > > A lot of CPU time is spent simply locking/unlocking q->lock during
> > > polling. Check for completion outside the lock to make q->lock disappear
> > > from the profile.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block/nvme.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/block/nvme.c b/block/nvme.c
> > > index eb2f54dd9d..7eb4512666 100644
> > > --- a/block/nvme.c
> > > +++ b/block/nvme.c
> > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> > >
> > >      for (i = 0; i < s->nr_queues; i++) {
> > >          NVMeQueuePair *q = s->queues[i];
> > > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > > +
> > > +        /*
> > > +         * q->lock isn't needed for checking completion because
> > > +         * nvme_process_completion() only runs in the event loop thread and
> > > +         * cannot race with itself.
> > > +         */
> > > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > > +            continue;
> > > +        }
> > > +
> >
> > IIUC, this is introducing an early check of the phase bit to determine
> > if there is something new in the queue.
> >
> > I'm fine with this optimization, but I have the feeling that the
> > comment doesn't properly describe it.
>
> I'm not sure I understand. The comment explains why it's safe not to
> take q->lock. Normally it would be taken. Without the comment readers
> could be confused why we ignore the locking rules here.
>
> As for documenting the cqe->status expression itself, I didn't think of
> explaining it since it's part of the theory of operation of this device.
> Any polling driver will do this, there's nothing QEMU-specific or
> unusual going on here.
>
> Would you like me to expand the comment explaining that NVMe polling
> consists of checking the phase bit of the latest cqe to check for
> readiness?
>
> Or maybe I misunderstood? :)

I was thinking of something like "Do an early check for
completions. We don't need q->lock here because
nvme_process_completion() only runs (...)"

Sergio.
Re: [PATCH 1/7] block/nvme: poll queues without q->lock
Posted by Stefan Hajnoczi 5 years, 5 months ago
On Fri, May 29, 2020 at 09:49:31AM +0200, Sergio Lopez wrote:
> On Thu, May 28, 2020 at 04:23:50PM +0100, Stefan Hajnoczi wrote:
> > On Mon, May 25, 2020 at 10:07:13AM +0200, Sergio Lopez wrote:
> > > On Tue, May 19, 2020 at 06:11:32PM +0100, Stefan Hajnoczi wrote:
> > > > A lot of CPU time is spent simply locking/unlocking q->lock during
> > > > polling. Check for completion outside the lock to make q->lock disappear
> > > > from the profile.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  block/nvme.c | 12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > >
> > > > diff --git a/block/nvme.c b/block/nvme.c
> > > > index eb2f54dd9d..7eb4512666 100644
> > > > --- a/block/nvme.c
> > > > +++ b/block/nvme.c
> > > > @@ -512,6 +512,18 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
> > > >
> > > >      for (i = 0; i < s->nr_queues; i++) {
> > > >          NVMeQueuePair *q = s->queues[i];
> > > > +        const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
> > > > +        NvmeCqe *cqe = (NvmeCqe *)&q->cq.queue[cqe_offset];
> > > > +
> > > > +        /*
> > > > +         * q->lock isn't needed for checking completion because
> > > > +         * nvme_process_completion() only runs in the event loop thread and
> > > > +         * cannot race with itself.
> > > > +         */
> > > > +        if ((le16_to_cpu(cqe->status) & 0x1) == q->cq_phase) {
> > > > +            continue;
> > > > +        }
> > > > +
> > >
> > > IIUC, this is introducing an early check of the phase bit to determine
> > > if there is something new in the queue.
> > >
> > > I'm fine with this optimization, but I have the feeling that the
> > > comment doesn't properly describe it.
> >
> > I'm not sure I understand. The comment explains why it's safe not to
> > take q->lock. Normally it would be taken. Without the comment readers
> > could be confused why we ignore the locking rules here.
> >
> > As for documenting the cqe->status expression itself, I didn't think of
> > explaining it since it's part of the theory of operation of this device.
> > Any polling driver will do this, there's nothing QEMU-specific or
> > unusual going on here.
> >
> > Would you like me to expand the comment explaining that NVMe polling
> > consists of checking the phase bit of the latest cqe to check for
> > readiness?
> >
> > Or maybe I misunderstood? :)
> 
> I was thinking of something like "Do an early check for
> completions. We don't need q->lock here because
> nvme_process_completion() only runs (...)"

Sure, will fix.

Stefan