[PATCH v1] can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket

Oleksij Rempel posted 1 patch 11 months ago
net/can/j1939/socket.c | 5 +++++
1 file changed, 5 insertions(+)
[PATCH v1] can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket
Posted by Oleksij Rempel 11 months ago
This patch addresses an issue within the j1939_sk_send_loop_abort()
function in the j1939/socket.c file, specifically in the context of
Transport Protocol (TP) sessions.

Without this patch, when a TP session is initiated and a Clear To Send
(CTS) frame is received from the remote side requesting one data packet,
the kernel dispatches the first Data Transport (DT) frame and then waits
for the next CTS. If the remote side doesn't respond with another CTS,
the kernel aborts due to a timeout. This leads to the user-space
receiving an EPOLLERR on the socket, and the socket becomes active.

However, when trying to read the error queue from the socket with
sock.recvmsg(, , socket.MSG_ERRQUEUE), it returns -EAGAIN,
given that the socket is non-blocking. This situation results in an
infinite loop: the user-space repeatedly calls epoll(), epoll() returns
the socket file descriptor with EPOLLERR, but the socket then blocks on
the recv() of ERRQUEUE.

This patch introduces an additional check for the J1939_SOCK_ERRQUEUE
flag within the j1939_sk_send_loop_abort() function. If the flag is set,
it indicates that the application has subscribed to receive error queue
messages. In such cases, the kernel can communicate the current transfer
state via the error queue. This allows for the function to return early,
preventing the unnecessary setting of the socket into an error state,
and breaking the infinite loop. It is crucial to note that a socket
error is only needed if the application isn't using the error queue, as,
without it, the application wouldn't be aware of transfer issues.

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-by: David Jander <david@protonic.nl>
Tested-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 net/can/j1939/socket.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 1790469b2580..35970c25496a 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -1088,6 +1088,11 @@ void j1939_sk_errqueue(struct j1939_session *session,
 
 void j1939_sk_send_loop_abort(struct sock *sk, int err)
 {
+	struct j1939_sock *jsk = j1939_sk(sk);
+
+	if (jsk->state & J1939_SOCK_ERRQUEUE)
+		return;
+
 	sk->sk_err = err;
 
 	sk_error_report(sk);
-- 
2.39.2
Re: [PATCH v1] can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket
Posted by Marc Kleine-Budde 11 months ago
On 26.05.2023 10:19:46, Oleksij Rempel wrote:
> This patch addresses an issue within the j1939_sk_send_loop_abort()
> function in the j1939/socket.c file, specifically in the context of
> Transport Protocol (TP) sessions.
> 
> Without this patch, when a TP session is initiated and a Clear To Send
> (CTS) frame is received from the remote side requesting one data packet,
> the kernel dispatches the first Data Transport (DT) frame and then waits
> for the next CTS. If the remote side doesn't respond with another CTS,
> the kernel aborts due to a timeout. This leads to the user-space
> receiving an EPOLLERR on the socket, and the socket becomes active.
> 
> However, when trying to read the error queue from the socket with
> sock.recvmsg(, , socket.MSG_ERRQUEUE), it returns -EAGAIN,
> given that the socket is non-blocking. This situation results in an
> infinite loop: the user-space repeatedly calls epoll(), epoll() returns
> the socket file descriptor with EPOLLERR, but the socket then blocks on
> the recv() of ERRQUEUE.
> 
> This patch introduces an additional check for the J1939_SOCK_ERRQUEUE
> flag within the j1939_sk_send_loop_abort() function. If the flag is set,
> it indicates that the application has subscribed to receive error queue
> messages. In such cases, the kernel can communicate the current transfer
> state via the error queue. This allows for the function to return early,
> preventing the unnecessary setting of the socket into an error state,
> and breaking the infinite loop. It is crucial to note that a socket
> error is only needed if the application isn't using the error queue, as,
> without it, the application wouldn't be aware of transfer issues.
> 
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Reported-by: David Jander <david@protonic.nl>
> Tested-by: David Jander <david@protonic.nl>
> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

Applied to linux-can, added stable on Cc.

Thanks,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |