tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++ 1 file changed, 4 insertions(+)
devmem test fails on NIPA. Most likely we get skb(s) with readable
frags (why?) but the failure manifests as an OOM. The OOM happens
because ncdevmem spams the following message:
recvmsg ret=-1
recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at
least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
---
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index 8dc9511d046f..c0a22938bed2 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -945,6 +945,10 @@ static int do_server(struct memory_buffer *mem)
continue;
if (ret < 0) {
perror("recvmsg");
+ if (errno == EFAULT) {
+ pr_err("received EFAULT, won't recover");
+ goto err_close_client;
+ }
continue;
}
if (ret == 0) {
--
2.51.0
On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote: > > devmem test fails on NIPA. Most likely we get skb(s) with readable > frags (why?) I would expect if we get readable frags that the frags land in the host buffer we provide ncdevmem and we actually hit this error: ``` 1 if (!is_devmem) { 0 pr_err("flow steering error"); 1 goto err_close_client; 2 } ``` which as it says, should be root caused in a flow steering error. I don't know what would cause an EFAULT off the top of my head. > but the failure manifests as an OOM. The OOM happens > because ncdevmem spams the following message: > > recvmsg ret=-1 > recvmsg: Bad address > > As of today, ncdevmem can't deal with various reasons of EFAULT: > - falling back to regular recvmsg for non-devmem skbs > - increasing ctrl_data size (can't happen with ncdevmem's large buffer) > > Exit (cleanly) with error when recvmsg returns EFAULT. This should at > least cause the test to cleanup its state. > > Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Either way, change looks good to me. Reviewed-by: Mina Almasry <almasrymina@google.com> -- Thanks, Mina
On 09/04, Mina Almasry wrote: > On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote: > > > > devmem test fails on NIPA. Most likely we get skb(s) with readable > > frags (why?) > > I would expect if we get readable frags that the frags land in the > host buffer we provide ncdevmem and we actually hit this error: > > ``` > 1 if (!is_devmem) { > 0 pr_err("flow steering error"); > 1 goto err_close_client; > 2 } > ``` > > which as it says, should be root caused in a flow steering error. I > don't know what would cause an EFAULT off the top of my head. Yea, I don't understand what happens :-( I'm thinking of doing the following as well: diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 40b774b4f587..0c18a8c7965f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, used); if (err <= 0) { if (!copied) - copied = -EFAULT; + copied = err; break; } Should give us more info for the devmem case... LMK if you don't like it. If I don't hear from you in a couple of days, I'll send it out..
On Fri, Sep 5, 2025 at 8:22 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 09/04, Mina Almasry wrote: > > On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote: > > > > > > devmem test fails on NIPA. Most likely we get skb(s) with readable > > > frags (why?) > > > > I would expect if we get readable frags that the frags land in the > > host buffer we provide ncdevmem and we actually hit this error: > > > > ``` > > 1 if (!is_devmem) { > > 0 pr_err("flow steering error"); > > 1 goto err_close_client; > > 2 } > > ``` > > > > which as it says, should be root caused in a flow steering error. I > > don't know what would cause an EFAULT off the top of my head. > > Yea, I don't understand what happens :-( I'm thinking of doing the > following as well: > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 40b774b4f587..0c18a8c7965f 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, > used); > if (err <= 0) { > if (!copied) > - copied = -EFAULT; > + copied = err; > > break; > } > > Should give us more info for the devmem case... LMK if you don't like > it. If I don't hear from you in a couple of days, I'll send it out.. Hmm, the other code paths overwrite the error to EFAULT; I don't know if that's significant in some way. But seems fine to me, I don't see why not do this, other than maybe potentional confusion with recvmsg returning an error not documented here: https://linux.die.net/man/2/recvmsg But that seems a minor point. -- Thanks, Mina
On 09/05, Mina Almasry wrote: > On Fri, Sep 5, 2025 at 8:22 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 09/04, Mina Almasry wrote: > > > On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote: > > > > > > > > devmem test fails on NIPA. Most likely we get skb(s) with readable > > > > frags (why?) > > > > > > I would expect if we get readable frags that the frags land in the > > > host buffer we provide ncdevmem and we actually hit this error: > > > > > > ``` > > > 1 if (!is_devmem) { > > > 0 pr_err("flow steering error"); > > > 1 goto err_close_client; > > > 2 } > > > ``` > > > > > > which as it says, should be root caused in a flow steering error. I > > > don't know what would cause an EFAULT off the top of my head. > > > > Yea, I don't understand what happens :-( I'm thinking of doing the > > following as well: > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 40b774b4f587..0c18a8c7965f 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, > > used); > > if (err <= 0) { > > if (!copied) > > - copied = -EFAULT; > > + copied = err; > > > > break; > > } > > > > Should give us more info for the devmem case... LMK if you don't like > > it. If I don't hear from you in a couple of days, I'll send it out.. > > Hmm, the other code paths overwrite the error to EFAULT; I don't know > if that's significant in some way. But seems fine to me, I don't see > why not do this, other than maybe potentional confusion with recvmsg > returning an error not documented here: > > https://linux.die.net/man/2/recvmsg > > But that seems a minor point. SG! Exporting new errnos seems to be ok because we are dealing with a new devmem mode? I think the bug we have on the fbnic is that we don't properly set skb->unreadable, so it's completely unrelated to this, but still feels like it might help with future debugging..
© 2016 - 2025 Red Hat, Inc.