[PATCH] hw/nvme: actually implement abort

Ayush Mishra posted 1 patch 3 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240702080232.848849-1-ayush.m55@samsung.com
Maintainers: Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Jesper Devantier <foss@defmacro.it>
hw/nvme/ctrl.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
[PATCH] hw/nvme: actually implement abort
Posted by Ayush Mishra 3 months, 2 weeks ago
Abort was not implemented previously, but we can implement it for AERs and asynchrnously for I/O.

Signed-off-by: Ayush Mishra <ayush.m55@samsung.com>
---
 hw/nvme/ctrl.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..a38037b5f8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1759,6 +1759,10 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
         break;
     }
 
+    if (ret == -ECANCELED) {
+        status = NVME_CMD_ABORT_REQ;
+    }
+
     trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
 
     error_setg_errno(&local_err, -ret, "aio failed");
@@ -5759,12 +5763,40 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req)
 static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req)
 {
     uint16_t sqid = le32_to_cpu(req->cmd.cdw10) & 0xffff;
+    uint16_t cid  = (le32_to_cpu(req->cmd.cdw10) >> 16) & 0xffff;
+    NvmeSQueue *sq = n->sq[sqid];
+    NvmeRequest *r, *next;
+    int i;
 
     req->cqe.result = 1;
     if (nvme_check_sqid(n, sqid)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
+    if (sqid == 0) {
+        for (i = 0; i < n->outstanding_aers; i++) {
+            NvmeRequest *re = n->aer_reqs[i];
+            if (re->cqe.cid == cid) {
+                memmove(n->aer_reqs + i, n->aer_reqs + i + 1,
+                         (n->outstanding_aers - i - 1) * sizeof(NvmeRequest *));
+                n->outstanding_aers--;
+                re->status = NVME_CMD_ABORT_REQ;
+                req->cqe.result = 0;
+                nvme_enqueue_req_completion(&n->admin_cq, re);
+                return NVME_SUCCESS;
+            }
+        }
+    }
+
+    QTAILQ_FOREACH_SAFE(r, &sq->out_req_list, entry, next) {
+        if (r->cqe.cid == cid) {
+            if (r->aiocb) {
+                blk_aio_cancel_async(r->aiocb);
+            }
+            break;
+        }
+    }
+
     return NVME_SUCCESS;
 }
 
-- 
2.43.0
Re: [PATCH] hw/nvme: actually implement abort
Posted by Keith Busch 3 months, 2 weeks ago
On Tue, Jul 02, 2024 at 01:32:32PM +0530, Ayush Mishra wrote:
> Abort was not implemented previously, but we can implement it for AERs and asynchrnously for I/O.

Not implemented for a reason. The target has no idea if the CID the
host requested to be aborted is from the same context that the target
has. Target may have previoulsy completed it, and the host re-issued a
new command after the abort, and due to the queueing could have been
observed in a different order, and now you aborted the wrong command.
Re: [PATCH] hw/nvme: actually implement abort
Posted by Klaus Jensen 3 months, 2 weeks ago
On Jul  2 09:24, Keith Busch wrote:
> On Tue, Jul 02, 2024 at 01:32:32PM +0530, Ayush Mishra wrote:
> > Abort was not implemented previously, but we can implement it for AERs and asynchrnously for I/O.
> 
> Not implemented for a reason. The target has no idea if the CID the
> host requested to be aborted is from the same context that the target
> has. Target may have previoulsy completed it, and the host re-issued a
> new command after the abort, and due to the queueing could have been
> observed in a different order, and now you aborted the wrong command.

I might be missing something here, but are you saying that the Abort
command is fundamentally flawed? Isn't this a host issue? The Abort is
for a specific CID on a specific SQID. The host *should* not screw this
up and reuse a CID it has an outstanding Abort on?

I don't think there are a lot of I/O commands that a host would be able
to cancel (in QEMU, not at all, because only the iscsi backend
actually implements blk_aio_cancel_async). But some commands that issue
multiple AIOs, like Copy, may be long running and with this it can
actually be cancelled.

And with regards to AERs, I don't see why it is not advantageous to be
able to Abort one?
Re: [PATCH] hw/nvme: actually implement abort
Posted by Klaus Jensen 3 months, 1 week ago
On Jul  2 20:55, Klaus Jensen wrote:
> On Jul  2 09:24, Keith Busch wrote:
> > On Tue, Jul 02, 2024 at 01:32:32PM +0530, Ayush Mishra wrote:
> > > Abort was not implemented previously, but we can implement it for AERs and asynchrnously for I/O.
> > 
> > Not implemented for a reason. The target has no idea if the CID the
> > host requested to be aborted is from the same context that the target
> > has. Target may have previoulsy completed it, and the host re-issued a
> > new command after the abort, and due to the queueing could have been
> > observed in a different order, and now you aborted the wrong command.
> 
> I might be missing something here, but are you saying that the Abort
> command is fundamentally flawed? Isn't this a host issue? The Abort is
> for a specific CID on a specific SQID. The host *should* not screw this
> up and reuse a CID it has an outstanding Abort on?
> 
> I don't think there are a lot of I/O commands that a host would be able
> to cancel (in QEMU, not at all, because only the iscsi backend
> actually implements blk_aio_cancel_async). But some commands that issue
> multiple AIOs, like Copy, may be long running and with this it can
> actually be cancelled.
> 
> And with regards to AERs, I don't see why it is not advantageous to be
> able to Abort one?

Keith, any thoughts on this?
Re: [PATCH] hw/nvme: actually implement abort
Posted by Keith Busch 3 months, 1 week ago
On Wed, Jul 10, 2024 at 11:09:43AM +0200, Klaus Jensen wrote:
> On Jul  2 20:55, Klaus Jensen wrote:
> > On Jul  2 09:24, Keith Busch wrote:
> > > On Tue, Jul 02, 2024 at 01:32:32PM +0530, Ayush Mishra wrote:
> > > > Abort was not implemented previously, but we can implement it for AERs and asynchrnously for I/O.
> > > 
> > > Not implemented for a reason. The target has no idea if the CID the
> > > host requested to be aborted is from the same context that the target
> > > has. Target may have previoulsy completed it, and the host re-issued a
> > > new command after the abort, and due to the queueing could have been
> > > observed in a different order, and now you aborted the wrong command.
> > 
> > I might be missing something here, but are you saying that the Abort
> > command is fundamentally flawed? Isn't this a host issue? The Abort is
> > for a specific CID on a specific SQID. The host *should* not screw this
> > up and reuse a CID it has an outstanding Abort on?
> > 
> > I don't think there are a lot of I/O commands that a host would be able
> > to cancel (in QEMU, not at all, because only the iscsi backend
> > actually implements blk_aio_cancel_async). But some commands that issue
> > multiple AIOs, like Copy, may be long running and with this it can
> > actually be cancelled.
> > 
> > And with regards to AERs, I don't see why it is not advantageous to be
> > able to Abort one?
> 
> Keith, any thoughts on this?

Oh, you can take this if you want, I'm just mentioning the pitfalls with
the abort command. While sequestoring command id's that are being
aborted may be good practice for the host, the spec doesn't say anything
about it. The Linux driver doesn't do that at least, though it recently
created a different mechanism to avoid immediate command id reuse.