block/nvme: support nested aio_poll()

[PATCH 3/7] block/nvme: don't access CQE after moving cq.head

Posted by Stefan Hajnoczi 5 years, 8 months ago

Do not access a CQE after incrementing q->cq.head and releasing q->lock.
It is unlikely that this causes problems in practice but it's a latent
bug.

The reason why it should be safe at the moment is that completion
processing is not re-entrant and the CQ doorbell isn't written until the
end of nvme_process_completion().

Make this change now because QEMU expects completion processing to be
re-entrant and later patches will do that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nvme.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5286227074..6bf58bc6aa 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -321,11 +321,14 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
     q->busy = true;
     assert(q->inflight >= 0);
     while (q->inflight) {
+        int ret;
         int16_t cid;
+
         c = (NvmeCqe *)&q->cq.queue[q->cq.head * NVME_CQ_ENTRY_BYTES];
         if ((le16_to_cpu(c->status) & 0x1) == q->cq_phase) {
             break;
         }
+        ret = nvme_translate_error(c);
         q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
         if (!q->cq.head) {
             q->cq_phase = !q->cq_phase;
@@ -344,7 +347,7 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
         preq->busy = false;
         preq->cb = preq->opaque = NULL;
         qemu_mutex_unlock(&q->lock);
-        req.cb(req.opaque, nvme_translate_error(c));
+        req.cb(req.opaque, ret);
         qemu_mutex_lock(&q->lock);
         q->inflight--;
         progress = true;
-- 
2.25.3

Re: [PATCH 3/7] block/nvme: don't access CQE after moving cq.head

Posted by Sergio Lopez 5 years, 8 months ago

On Tue, May 19, 2020 at 06:11:34PM +0100, Stefan Hajnoczi wrote:
> Do not access a CQE after incrementing q->cq.head and releasing q->lock.
> It is unlikely that this causes problems in practice but it's a latent
> bug.
> 
> The reason why it should be safe at the moment is that completion
> processing is not re-entrant and the CQ doorbell isn't written until the
> end of nvme_process_completion().
> 
> Make this change now because QEMU expects completion processing to be
> re-entrant and later patches will do that.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nvme.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

Re: [PATCH 3/7] block/nvme: don't access CQE after moving cq.head

Posted by Philippe Mathieu-Daudé 5 years, 8 months ago

On 5/19/20 7:11 PM, Stefan Hajnoczi wrote:
> Do not access a CQE after incrementing q->cq.head and releasing q->lock.
> It is unlikely that this causes problems in practice but it's a latent
> bug.
> 
> The reason why it should be safe at the moment is that completion
> processing is not re-entrant and the CQ doorbell isn't written until the
> end of nvme_process_completion().
> 
> Make this change now because QEMU expects completion processing to be
> re-entrant and later patches will do that.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nvme.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 5286227074..6bf58bc6aa 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -321,11 +321,14 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
>      q->busy = true;
>      assert(q->inflight >= 0);
>      while (q->inflight) {
> +        int ret;
>          int16_t cid;
> +
>          c = (NvmeCqe *)&q->cq.queue[q->cq.head * NVME_CQ_ENTRY_BYTES];
>          if ((le16_to_cpu(c->status) & 0x1) == q->cq_phase) {
>              break;
>          }
> +        ret = nvme_translate_error(c);

Tricky.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

>          q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
>          if (!q->cq.head) {
>              q->cq_phase = !q->cq_phase;
> @@ -344,7 +347,7 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
>          preq->busy = false;
>          preq->cb = preq->opaque = NULL;
>          qemu_mutex_unlock(&q->lock);
> -        req.cb(req.opaque, nvme_translate_error(c));
> +        req.cb(req.opaque, ret);
>          qemu_mutex_lock(&q->lock);
>          q->inflight--;
>          progress = true;
>

[PATCH 1/7] block/nvme: poll queues without q->lock
[PATCH 2/7] block/nvme: drop tautologous assertion
[PATCH 3/7] block/nvme: don't access CQE after moving cq.head
[PATCH 4/7] block/nvme: switch to a NVMeRequest freelist
[PATCH 5/7] block/nvme: clarify that free_req_queue is protected by q->lock
[PATCH 6/7] block/nvme: keep BDRVNVMeState pointer in NVMeQueuePair
[PATCH 7/7] block/nvme: support nested aio_poll()