[v1] nvme: Let the blocklayer set timeouts for requests

[PATCH] nvme: Let the blocklayer set timeouts for requests

Posted by Heyne, Maximilian 2 months, 1 week ago

When initializing an nvme request which is about to be send to the block
layer, we do not need to initialize its timeout. If it's left
uninitialized at 0 the block layer will use the request queue's timeout
in blk_add_timer (via nvme_start_request which is called from
nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
NVME_ADMIN_TIMEOUT when the request queues were created.

Because the io_timeout of the IO queues can actually be modified via
sysfs, the following situation can occur:

1) NVME_IO_TIMEOUT = 30 (default module parameter)
2) nvme1n1 is probed. IO queues default timeout is 30 s
3) manually change the IO timeout to 90 s
   echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
4) nvme zns report-zones /dev/nvme1n1
   This command issues IO commands with timeout 30 s instead of the
   wanted 90 s which might be more suitable for this device.

This patch, therefore, improves the consistency of IO timeout usage.
However, there are still uses of NVME_IO_TIMEOUT which could be
inconsistent with what is set in the device's request_queue by the user.

Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
 drivers/nvme/host/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f1f719351f3f2..3a6d74e6dae11 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
 		struct nvme_ns *ns = req->q->disk->private_data;
 
 		logging_enabled = ns->head->passthru_err_log_enabled;
-		req->timeout = NVME_IO_TIMEOUT;
 	} else { /* no queuedata implies admin queue */
 		logging_enabled = nr->ctrl->passthru_err_log_enabled;
-		req->timeout = NVME_ADMIN_TIMEOUT;
 	}
 
 	if (!logging_enabled)
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

Re: [PATCH] nvme: Let the blocklayer set timeouts for requests

Posted by Keith Busch 2 months ago

On Tue, Dec 02, 2025 at 01:58:19PM +0000, Heyne, Maximilian wrote:
> When initializing an nvme request which is about to be send to the block
> layer, we do not need to initialize its timeout. If it's left
> uninitialized at 0 the block layer will use the request queue's timeout
> in blk_add_timer (via nvme_start_request which is called from
> nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
> NVME_ADMIN_TIMEOUT when the request queues were created.
> 
> Because the io_timeout of the IO queues can actually be modified via
> sysfs, the following situation can occur:
> 
> 1) NVME_IO_TIMEOUT = 30 (default module parameter)
> 2) nvme1n1 is probed. IO queues default timeout is 30 s
> 3) manually change the IO timeout to 90 s
>    echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
> 4) nvme zns report-zones /dev/nvme1n1
>    This command issues IO commands with timeout 30 s instead of the
>    wanted 90 s which might be more suitable for this device.

Does this example really use 30s, though? User space commands should be
going through nvme_submit_user_cmd(), which overrides the timeout set
from the nvme_init_request with whatever the user requested (usually 0).

The code change looks fine, though.

Re: [PATCH] nvme: Let the blocklayer set timeouts for requests

Posted by Heyne, Maximilian 2 months ago

On Tue, Dec 02, 2025 at 10:39:11AM -0700, Keith Busch wrote:
> On Tue, Dec 02, 2025 at 01:58:19PM +0000, Heyne, Maximilian wrote:
> > When initializing an nvme request which is about to be send to the block
> > layer, we do not need to initialize its timeout. If it's left
> > uninitialized at 0 the block layer will use the request queue's timeout
> > in blk_add_timer (via nvme_start_request which is called from
> > nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
> > NVME_ADMIN_TIMEOUT when the request queues were created.
> > 
> > Because the io_timeout of the IO queues can actually be modified via
> > sysfs, the following situation can occur:
> > 
> > 1) NVME_IO_TIMEOUT = 30 (default module parameter)
> > 2) nvme1n1 is probed. IO queues default timeout is 30 s
> > 3) manually change the IO timeout to 90 s
> >    echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
> > 4) nvme zns report-zones /dev/nvme1n1
> >    This command issues IO commands with timeout 30 s instead of the
> >    wanted 90 s which might be more suitable for this device.
> 
> Does this example really use 30s, though? User space commands should be
> going through nvme_submit_user_cmd(), which overrides the timeout set
> from the nvme_init_request with whatever the user requested (usually 0).

You're right. I actually worked on multiple (older) kernel versions and
forgot about this case. It was actually commit 470e900c8036ff ("nvme:
refactor nvme_alloc_request") which subtly changed the behavior but only
for the ioctl case. So ioctl's are fine then but, for example,
everything which goes via nvme_submit_sync_cmd shows the issue. So we
need to update the commit message accordingly. Sorry for that. I'll give
it a day or two for further comments on this patch and then send it with
a more correct message.

Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597