[PATCH net-next] selftests: ncdevmem: don't retry EFAULT

Stanislav Fomichev posted 1 patch 4 weeks ago
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++
1 file changed, 4 insertions(+)
[PATCH net-next] selftests: ncdevmem: don't retry EFAULT
Posted by Stanislav Fomichev 4 weeks ago
devmem test fails on NIPA. Most likely we get skb(s) with readable
frags (why?) but the failure manifests as an OOM. The OOM happens
because ncdevmem spams the following message:

  recvmsg ret=-1
  recvmsg: Bad address

As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)

Exit (cleanly) with error when recvmsg returns EFAULT. This should at
least cause the test to cleanup its state.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
---
 tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index 8dc9511d046f..c0a22938bed2 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -945,6 +945,10 @@ static int do_server(struct memory_buffer *mem)
 			continue;
 		if (ret < 0) {
 			perror("recvmsg");
+			if (errno == EFAULT) {
+				pr_err("received EFAULT, won't recover");
+				goto err_close_client;
+			}
 			continue;
 		}
 		if (ret == 0) {
-- 
2.51.0
Re: [PATCH net-next] selftests: ncdevmem: don't retry EFAULT
Posted by Mina Almasry 4 weeks ago
On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> devmem test fails on NIPA. Most likely we get skb(s) with readable
> frags (why?)

I would expect if we get readable frags that the frags land in the
host buffer we provide ncdevmem and we actually hit this error:

```
  1                 if (!is_devmem) {
  0                         pr_err("flow steering error");
  1                         goto err_close_client;
  2                 }
```

which as it says, should be root caused in a flow steering error. I
don't know what would cause an EFAULT off the top of my head.

> but the failure manifests as an OOM. The OOM happens
> because ncdevmem spams the following message:
>
>   recvmsg ret=-1
>   recvmsg: Bad address
>
> As of today, ncdevmem can't deal with various reasons of EFAULT:
> - falling back to regular recvmsg for non-devmem skbs
> - increasing ctrl_data size (can't happen with ncdevmem's large buffer)
>
> Exit (cleanly) with error when recvmsg returns EFAULT. This should at
> least cause the test to cleanup its state.
>
> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>

Either way, change looks good to me.

Reviewed-by: Mina Almasry <almasrymina@google.com>

-- 
Thanks,
Mina
Re: [PATCH net-next] selftests: ncdevmem: don't retry EFAULT
Posted by Stanislav Fomichev 3 weeks, 6 days ago
On 09/04, Mina Almasry wrote:
> On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > devmem test fails on NIPA. Most likely we get skb(s) with readable
> > frags (why?)
> 
> I would expect if we get readable frags that the frags land in the
> host buffer we provide ncdevmem and we actually hit this error:
> 
> ```
>   1                 if (!is_devmem) {
>   0                         pr_err("flow steering error");
>   1                         goto err_close_client;
>   2                 }
> ```
> 
> which as it says, should be root caused in a flow steering error. I
> don't know what would cause an EFAULT off the top of my head.

Yea, I don't understand what happens :-( I'm thinking of doing the
following as well:

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 40b774b4f587..0c18a8c7965f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
 							 used);
 				if (err <= 0) {
 					if (!copied)
-						copied = -EFAULT;
+						copied = err;
 
 					break;
 				}

Should give us more info for the devmem case... LMK if you don't like
it. If I don't hear from you in a couple of days, I'll send it out..
Re: [PATCH net-next] selftests: ncdevmem: don't retry EFAULT
Posted by Mina Almasry 3 weeks, 6 days ago
On Fri, Sep 5, 2025 at 8:22 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 09/04, Mina Almasry wrote:
> > On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > devmem test fails on NIPA. Most likely we get skb(s) with readable
> > > frags (why?)
> >
> > I would expect if we get readable frags that the frags land in the
> > host buffer we provide ncdevmem and we actually hit this error:
> >
> > ```
> >   1                 if (!is_devmem) {
> >   0                         pr_err("flow steering error");
> >   1                         goto err_close_client;
> >   2                 }
> > ```
> >
> > which as it says, should be root caused in a flow steering error. I
> > don't know what would cause an EFAULT off the top of my head.
>
> Yea, I don't understand what happens :-( I'm thinking of doing the
> following as well:
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 40b774b4f587..0c18a8c7965f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
>                                                          used);
>                                 if (err <= 0) {
>                                         if (!copied)
> -                                               copied = -EFAULT;
> +                                               copied = err;
>
>                                         break;
>                                 }
>
> Should give us more info for the devmem case... LMK if you don't like
> it. If I don't hear from you in a couple of days, I'll send it out..

Hmm, the other code paths overwrite the error to EFAULT; I don't know
if that's significant in some way. But seems fine to me, I don't see
why not do this, other than maybe potentional confusion with recvmsg
returning an error not documented here:

https://linux.die.net/man/2/recvmsg

But that seems a minor point.


-- 
Thanks,
Mina
Re: [PATCH net-next] selftests: ncdevmem: don't retry EFAULT
Posted by Stanislav Fomichev 3 weeks, 6 days ago
On 09/05, Mina Almasry wrote:
> On Fri, Sep 5, 2025 at 8:22 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 09/04, Mina Almasry wrote:
> > > On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > devmem test fails on NIPA. Most likely we get skb(s) with readable
> > > > frags (why?)
> > >
> > > I would expect if we get readable frags that the frags land in the
> > > host buffer we provide ncdevmem and we actually hit this error:
> > >
> > > ```
> > >   1                 if (!is_devmem) {
> > >   0                         pr_err("flow steering error");
> > >   1                         goto err_close_client;
> > >   2                 }
> > > ```
> > >
> > > which as it says, should be root caused in a flow steering error. I
> > > don't know what would cause an EFAULT off the top of my head.
> >
> > Yea, I don't understand what happens :-( I'm thinking of doing the
> > following as well:
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 40b774b4f587..0c18a8c7965f 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> >                                                          used);
> >                                 if (err <= 0) {
> >                                         if (!copied)
> > -                                               copied = -EFAULT;
> > +                                               copied = err;
> >
> >                                         break;
> >                                 }
> >
> > Should give us more info for the devmem case... LMK if you don't like
> > it. If I don't hear from you in a couple of days, I'll send it out..
> 
> Hmm, the other code paths overwrite the error to EFAULT; I don't know
> if that's significant in some way. But seems fine to me, I don't see
> why not do this, other than maybe potentional confusion with recvmsg
> returning an error not documented here:
> 
> https://linux.die.net/man/2/recvmsg
> 
> But that seems a minor point.

SG! Exporting new errnos seems to be ok because we are dealing with a
new devmem mode? I think the bug we have on the fbnic is that we don't
properly set skb->unreadable, so it's completely unrelated to this,
but still feels like it might help with future debugging..