If a lot of request are in the queue, this message is spamming the logs,
thus rate limit it.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
drivers/nvme/host/tcp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index dc5bbca58c6dcbce40cfa3de893592d768ebc939..1ed0bc10b2dffe534536b1073abc0302056aa51e 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1257,8 +1257,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
if (ret == -EAGAIN) {
ret = 0;
} else if (ret < 0) {
- dev_err(queue->ctrl->ctrl.device,
- "failed to send request %d\n", ret);
+ dev_err_ratelimited(queue->ctrl->ctrl.device,
+ "failed to send request %d\n", ret);
nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
}
--
2.48.1
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote: > If a lot of request are in the queue, this message is spamming the logs, > thus rate limit it. Are in the queue when what happens? Not that I'm against this, but if we have a known condition where this error is printed a lot we should probably skip it entirely for that?
On Wed, Jan 29, 2025 at 07:05:34AM +0100, Christoph Hellwig wrote: > On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote: > > If a lot of request are in the queue, this message is spamming the logs, > > thus rate limit it. > > Are in the queue when what happens? Not that I'm against this, > but if we have a known condition where this error is printed a lot > we should probably skip it entirely for that? The condition is that all the elements in the queue->send_list could fail as a batch. I had a bug in my patches which re-queued all the failed command immediately and semd them out again, thus spamming the log. This behavior doesn't exist in upstream. I just thought it might make sense to rate limit as precaution. I don't know if it is worth the code churn.
On Thu, Jan 30, 2025 at 04:25:35PM +0100, Daniel Wagner wrote: > On Wed, Jan 29, 2025 at 07:05:34AM +0100, Christoph Hellwig wrote: > > On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote: > > > If a lot of request are in the queue, this message is spamming the logs, > > > thus rate limit it. > > > > Are in the queue when what happens? Not that I'm against this, > > but if we have a known condition where this error is printed a lot > > we should probably skip it entirely for that? > > The condition is that all the elements in the queue->send_list could fail as a > batch. I had a bug in my patches which re-queued all the failed command > immediately and semd them out again, thus spamming the log. > > This behavior doesn't exist in upstream. I just thought it might make > sense to rate limit as precaution. I don't know if it is worth the code > churn. I'm fine with the rate limiting. I was just wondering if there is a case where we'd easily hit it and could do even better.
On 31/01/2025 9:29, Christoph Hellwig wrote: > On Thu, Jan 30, 2025 at 04:25:35PM +0100, Daniel Wagner wrote: >> On Wed, Jan 29, 2025 at 07:05:34AM +0100, Christoph Hellwig wrote: >>> On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote: >>>> If a lot of request are in the queue, this message is spamming the logs, >>>> thus rate limit it. >>> Are in the queue when what happens? Not that I'm against this, >>> but if we have a known condition where this error is printed a lot >>> we should probably skip it entirely for that? >> The condition is that all the elements in the queue->send_list could fail as a >> batch. I had a bug in my patches which re-queued all the failed command >> immediately and semd them out again, thus spamming the log. >> >> This behavior doesn't exist in upstream. I just thought it might make >> sense to rate limit as precaution. I don't know if it is worth the code >> churn. > I'm fine with the rate limiting. I was just wondering if there is > a case where we'd easily hit it and could do even better. I agree with the change. The reason why I think its useful to keep is because its has been really useful indication when debugging UAF panics.
© 2016 - 2026 Red Hat, Inc.