[PATCH v5] nvme: Skip trace complete_rq on host path error

전민식 posted 1 patch 1 week ago
drivers/nvme/host/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
[PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by 전민식 1 week ago
Hi hch,

I added a comment about why I do trace skip if it's host path error.


Thanks

Regards,
Minsik Jeon


From 3ee001c00cb4c7843d7bbb0c11fcc1fafeded9b4 Mon Sep 17 00:00:00 2001
From: Minsik Jeon <hmi.jeon@samsung.com>
Date: Thu, 26 Mar 2026 11:09:57 +0900
Subject: [PATCH v5] nvme: Skip trace complete_rq on host path error

we were checking host_pathing_error before calling nvme_setup_cmd().
This is caused the command setup to be skipped entirely when a pathing
error occurred, making it impossible to trace the nvme command via
trace_cmd nvme_complete_rq().

As a result, when nvme_complete_rq() logged a completion with cmdid=0,
it was impossible to correlate the completion with the nvme command
request.

This patch Skip trace_nvme_complete_rq() on NVMe host path error.

Co-authored-by: Beomsoo Kim <beomsooo.kim@samsung.com>
Co-authored-by: Eunsoo Lee <euns212.lee@samsung.com>
Co-authored-by: Steven Seungcheol Lee <sc108.lee@samsung.com>
Signed-off-by: Minsik Jeon <hmi.jeon@samsung.com>
---
 drivers/nvme/host/core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 766e9cc4ffca..df5a47b8560d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -458,7 +458,14 @@ void nvme_complete_rq(struct request *req)
 {
 	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
 
-	trace_nvme_complete_rq(req);
+	/*
+	 * The idea for these trace events was to match up commands
+	 * dispatched to hardware with the hardware's posted response.
+	 * So skip tracing for undispatched commands.
+	 */
+	if (nvme_req(req)->status != NVME_SC_HOST_PATH_ERROR)
+		trace_nvme_complete_rq(req);
+
 	nvme_cleanup_cmd(req);
 
 	/*
-- 
2.52.0
Re: [PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by Keith Busch 1 week ago
On Thu, Mar 26, 2026 at 03:51:52PM +0900, 전민식 wrote:
>  {
>  	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
>  
> -	trace_nvme_complete_rq(req);
> +	/*
> +	 * The idea for these trace events was to match up commands
> +	 * dispatched to hardware with the hardware's posted response.
> +	 * So skip tracing for undispatched commands.
> +	 */
> +	if (nvme_req(req)->status != NVME_SC_HOST_PATH_ERROR)
> +		trace_nvme_complete_rq(req);
> +

Well, how do we know a controller doesnn't actually return that status
code? I was just suggesting to skip the trace for the condition we never
dispatched the command. An added bonus is we don't need a mostly
unnecessary 'if' check on every IO.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f5ebcaa2f859c..0dcccdca2965e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -454,11 +454,10 @@ void nvme_end_req(struct request *req)
 	blk_mq_end_request(req, status);
 }
 
-void nvme_complete_rq(struct request *req)
+static void __nvme_complete_rq(struct request *req)
 {
 	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
 
-	trace_nvme_complete_rq(req);
 	nvme_cleanup_cmd(req);
 
 	/*
@@ -493,6 +492,12 @@ void nvme_complete_rq(struct request *req)
 		return;
 	}
 }
+
+void nvme_complete_rq(struct request *req)
+{
+	trace_nvme_complete_rq(req);
+	__nvme_complete_rq(req);
+}
 EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
 void nvme_complete_batch_req(struct request *req)
@@ -513,7 +518,7 @@ blk_status_t nvme_host_path_error(struct request *req)
 {
 	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
 	blk_mq_set_request_complete(req);
-	nvme_complete_rq(req);
+	__nvme_complete_rq(req);
 	return BLK_STS_OK;
 }
 EXPORT_SYMBOL_GPL(nvme_host_path_error);
--
Re: [PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by hch@lst.de 1 week ago
On Thu, Mar 26, 2026 at 08:28:46AM -0600, Keith Busch wrote:
> On Thu, Mar 26, 2026 at 03:51:52PM +0900, 전민식 wrote:
> >  {
> >  	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
> >  
> > -	trace_nvme_complete_rq(req);
> > +	/*
> > +	 * The idea for these trace events was to match up commands
> > +	 * dispatched to hardware with the hardware's posted response.
> > +	 * So skip tracing for undispatched commands.
> > +	 */
> > +	if (nvme_req(req)->status != NVME_SC_HOST_PATH_ERROR)
> > +		trace_nvme_complete_rq(req);
> > +
> 
> Well, how do we know a controller doesnn't actually return that status
> code? I was just suggesting to skip the trace for the condition we never
> dispatched the command. An added bonus is we don't need a mostly
> unnecessary 'if' check on every IO.

The 7?h error values were added for host use.  The description of
the section in the spec suggests this, but isn't actually as clear
as I would like it.  I wonder if we need to verify that controller
don't incorrectly return it, as that could cause some problems?

Independent of that your patch below looks sane to me.

Re: [PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by Keith Busch 1 week ago
On Thu, Mar 26, 2026 at 03:31:24PM +0100, hch@lst.de wrote:
> On Thu, Mar 26, 2026 at 08:28:46AM -0600, Keith Busch wrote:
> > On Thu, Mar 26, 2026 at 03:51:52PM +0900, 전민식 wrote:
> > >  {
> > >  	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
> > >  
> > > -	trace_nvme_complete_rq(req);
> > > +	/*
> > > +	 * The idea for these trace events was to match up commands
> > > +	 * dispatched to hardware with the hardware's posted response.
> > > +	 * So skip tracing for undispatched commands.
> > > +	 */
> > > +	if (nvme_req(req)->status != NVME_SC_HOST_PATH_ERROR)
> > > +		trace_nvme_complete_rq(req);
> > > +
> > 
> > Well, how do we know a controller doesnn't actually return that status
> > code? I was just suggesting to skip the trace for the condition we never
> > dispatched the command. An added bonus is we don't need a mostly
> > unnecessary 'if' check on every IO.
> 
> The 7?h error values were added for host use.  The description of
> the section in the spec suggests this, but isn't actually as clear
> as I would like it.  I wonder if we need to verify that controller
> don't incorrectly return it, as that could cause some problems?

Yeah, the spec is not very clear on their use. It just defines the codes
and a one sentence description, and then never mentioned again. I think
it allows the possibility for the controller to internally complete a
command with such status from some undefined OOB mechanism. I'm not sure
why the host needs to have their own self-generated status codes anyway
since it can already attach whatever arbitrary flags it wants to its
private data, like how we use NVME_REQ_CANCELLED.

Anyway if a controller did return it for whatever reason, even if it is
a bug, we'd want to see it in the trace.
Re: [PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by 전민식 6 days, 14 hours ago
On Thu, Mar 26, 2026 at 08:43 -0600, Keith Busch wrote:
> Yeah, the spec is not very clear on their use. It just defines the codes
> and a one sentence description, and then never mentioned again. I think
> it allows the possibility for the controller to internally complete a
> command with such status from some undefined OOB mechanism. I'm not sure
> why the host needs to have their own self-generated status codes anyway
> since it can already attach whatever arbitrary flags it wants to its
> private data, like how we use NVME_REQ_CANCELLED.
> 
> Anyway if a controller did return it for whatever reason, even if it is
> a bug, we'd want to see it in the trace.

Sorry, The date is wrong, so I send it again.
On Thu, Mar 26, 2026 at 08:43 -0600, hch@lst.de wrote:
-> On Thu, Mar 26, 2026 at 14:43PM UTC, Keith Busch wrote:

I've confirmed what you discussed, and it seems like the approach you
proposed is a better one.
Can I send patch v7 as above?


Regards,
Minsik Jeon
Re: [PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by 전민식 6 days, 14 hours ago
On Thu, Mar 26, 2026 at 03:43PM +0100, hch@lst.de wrote:
> Yeah, the spec is not very clear on their use. It just defines the codes
> and a one sentence description, and then never mentioned again. I think
> it allows the possibility for the controller to internally complete a
> command with such status from some undefined OOB mechanism. I'm not sure
> why the host needs to have their own self-generated status codes anyway
> since it can already attach whatever arbitrary flags it wants to its
> private data, like how we use NVME_REQ_CANCELLED.
> 
> Anyway if a controller did return it for whatever reason, even if it is
> a bug, we'd want to see it in the trace.


I've confirmed what you discussed, and it seems like the approach you
proposed is a better one.
Can I send patch v7 as above?

Regrad,
Minsik Jeon
Re: [PATCH v5] nvme: Skip trace complete_rq on host path error
Posted by hch@lst.de 1 week ago
On Thu, Mar 26, 2026 at 03:51:52PM +0900, 전민식 wrote:
> Hi hch,
> 
> I added a comment about why I do trace skip if it's host path error.

The patch looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

But such note go below the '--' so that don't end up in git history.