net/netlink/af_netlink.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
the netlink socket. If the wait timeout fully expires (timeo == 0),
netlink mistakenly interprets the zeroed timeout as a non-blocking
request. It then triggers netlink_overrun that drops the event,
completely bypassing the audit subsystem's internal retry queue, and
falsely returns ENOBUFS to user-space, resulting in the following error:
auditd[]: Error receiving audit netlink packet (No buffer space available)
Fix this by detecting when a blocking sender's timeout has expired
(timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
on the next iteration), safely free the skb and return -EAGAIN, allowing
the audit subsystem to gracefully enqueue the pending event into its
internal backlog.
Suggested-by: Steve Grubb <sgrubb@redhat.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
---
Changes in v2:
- Use the simple check (timeo == 0 && !nonblock) to detect
expired timeout, avoiding adding a new NETLINK flag.
net/netlink/af_netlink.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2aeb0680807d..fdc3db74b178 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
}
err = netlink_attachskb(sk, skb, &timeo, ssk);
- if (err == 1)
+ if (err == 1) {
+ /* timeo may have been zeroed by schedule_timeout inside
+ * netlink_attachskb. If the caller is a timed-blocking sender
+ * (not genuinely nonblocking), don't re-enter with timeo=0 as
+ * that would misfire netlink_overrun on the next iteration.
+ */
+ if (timeo == 0 && !nonblock) {
+ kfree_skb(skb);
+ return -EAGAIN;
+ }
goto retry;
+ }
if (err)
return err;
--
2.53.0
On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote: > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on > the netlink socket. Holding socket lock during slow IO sounds very wrong. One could say - that's abuse of the socket lock? > If the wait timeout fully expires (timeo == 0), > netlink mistakenly interprets the zeroed timeout as a non-blocking > request. It then triggers netlink_overrun that drops the event, > completely bypassing the audit subsystem's internal retry queue, and > falsely returns ENOBUFS to user-space, resulting in the following error: > > auditd[]: Error receiving audit netlink packet (No buffer space available) > > Fix this by detecting when a blocking sender's timeout has expired > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun > on the next iteration), safely free the skb and return -EAGAIN, allowing > the audit subsystem to gracefully enqueue the pending event into its > internal backlog. The socket _is_ the queue, normally. Please explore fixing this in audit? -- pw-bot: cr
On Mon, May 18, 2026 at 9:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote: > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on > > the netlink socket. > > Holding socket lock during slow IO sounds very wrong. One could say - > that's abuse of the socket lock? > > > If the wait timeout fully expires (timeo == 0), > > netlink mistakenly interprets the zeroed timeout as a non-blocking > > request. It then triggers netlink_overrun that drops the event, > > completely bypassing the audit subsystem's internal retry queue, and > > falsely returns ENOBUFS to user-space, resulting in the following error: > > > > auditd[]: Error receiving audit netlink packet (No buffer space available) > > > > Fix this by detecting when a blocking sender's timeout has expired > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun > > on the next iteration), safely free the skb and return -EAGAIN, allowing > > the audit subsystem to gracefully enqueue the pending event into its > > internal backlog. > > The socket _is_ the queue, normally. > > Please explore fixing this in audit? > -- > pw-bot: cr > Hi Jakub, Thanks for reviewing this patch as well. First, regarding the lock: kauditd does not hold the socket lock during slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on nlk->wait (a wait queue). No socket lock or mutex is held during the sleep. Second, regarding an audit-only fix: the symptom manifests as sk->sk_err = ENOBUFS set inside netlink_overrun() (called from netlink_attachskb when timeo == 0). Audit has no mechanism to prevent or clear this socket state from the outside. Potential workarounds all fail: (1) Clearing sk_err after the fact is racy and affects other socket ops (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable kernels where this bug is actively impacting users I've submitted v3 [1] with NETLINK_UNICAST_TIMED as an explicit opt-in constant. This is strictly additive and leaves all existing callers untouched: - Standard blocking (0): unchanged - Standard non-blocking (1, MSG_DONTWAIT): unchanged - Timed blocking (2): new opt-in for audit Would you mind reviewing the v3 and checking if it would be considered? [1] https://lore.kernel.org/audit/20260527192150.949400-1-rrobaina@redhat.com/T/#u Best regards, Ricardo
On Wed, 27 May 2026 16:29:37 -0300 Ricardo Robaina wrote: > On Mon, May 18, 2026 at 9:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote: > > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on > > > the netlink socket. > > > > Holding socket lock during slow IO sounds very wrong. One could say - > > that's abuse of the socket lock? > > > > > If the wait timeout fully expires (timeo == 0), > > > netlink mistakenly interprets the zeroed timeout as a non-blocking > > > request. It then triggers netlink_overrun that drops the event, > > > completely bypassing the audit subsystem's internal retry queue, and > > > falsely returns ENOBUFS to user-space, resulting in the following error: > > > > > > auditd[]: Error receiving audit netlink packet (No buffer space available) > > > > > > Fix this by detecting when a blocking sender's timeout has expired > > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead > > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun > > > on the next iteration), safely free the skb and return -EAGAIN, allowing > > > the audit subsystem to gracefully enqueue the pending event into its > > > internal backlog. > > > > The socket _is_ the queue, normally. > > > > Please explore fixing this in audit? > > -- > > pw-bot: cr > > > > Hi Jakub, > > Thanks for reviewing this patch as well. > > First, regarding the lock: kauditd does not hold the socket lock during > slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on > nlk->wait (a wait queue). No socket lock or mutex is held during the sleep. So you're saying the queue _is_ actually congested? netlink_attachskb() sleeps because there's no space left in the socket's rcvbuf? So the skbs are moved to audit_retry_queue "temporarily" until user space drains its socket and kernel can succeed sending? Could you confirm this understanding is correct? > Second, regarding an audit-only fix: the symptom manifests as sk->sk_err = > ENOBUFS set inside netlink_overrun() (called from netlink_attachskb when > timeo == 0). Audit has no mechanism to prevent or clear this socket state > from the outside. Potential workarounds all fail: > > (1) Clearing sk_err after the fact is racy and affects other socket ops Why would you clear the sk_err, it's the reader's responsibility to clear the congestion and the reader is AFAIU a user space process. > (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism What's the anti-deadlock mechanism? > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable > kernels where this bug is actively impacting users Which commit are you referring to? Isn't that flag itself ancient? > I've submitted v3 [1] with NETLINK_UNICAST_TIMED as an explicit opt-in > constant. It's really not great to fall silent for 10+ days, then respond and immediately posts equally pointless next version of the patch :/
On Wednesday, May 27, 2026 6:29:36 PM Eastern Daylight Time Jakub Kicinski
wrote:
> On Wed, 27 May 2026 16:29:37 -0300 Ricardo Robaina wrote:
> > On Mon, May 18, 2026 at 9:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> > > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks
> > > > on
> > > > the netlink socket.
> > >
> > > Holding socket lock during slow IO sounds very wrong. One could say -
> > > that's abuse of the socket lock?
> > >
> > > > If the wait timeout fully expires (timeo == 0),
> > > > netlink mistakenly interprets the zeroed timeout as a non-blocking
> > > > request. It then triggers netlink_overrun that drops the event,
> > > > completely bypassing the audit subsystem's internal retry queue, and
> > > >
> > > > falsely returns ENOBUFS to user-space, resulting in the following
error:
> > > > auditd[]: Error receiving audit netlink packet (No buffer space
> > > > available)
> > > >
> > > > Fix this by detecting when a blocking sender's timeout has expired
> > > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> > > > of retrying with timeo=0 (which would incorrectly trigger
> > > > netlink_overrun
> > > > on the next iteration), safely free the skb and return -EAGAIN,
> > > > allowing
> > > > the audit subsystem to gracefully enqueue the pending event into its
> > > > internal backlog.
> > >
> > > The socket _is_ the queue, normally.
> > >
> > > Please explore fixing this in audit?
> > > --
> > > pw-bot: cr
> >
> > Hi Jakub,
> >
> > Thanks for reviewing this patch as well.
> >
> > First, regarding the lock: kauditd does not hold the socket lock during
> > slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on
> > nlk->wait (a wait queue). No socket lock or mutex is held during the
> > sleep.
>
> So you're saying the queue _is_ actually congested?
Yes. the socket buffer is genuinely full because auditd can't drain fast
enough.
> netlink_attachskb() sleeps because there's no space left in the socket's
> rcvbuf? So the skbs are moved to audit_retry_queue "temporarily" until
> user space drains its socket and kernel can succeed sending?
>
> Could you confirm this understanding is correct?
Yes. kauditd sleeps in netlink_attachskb, the HZ/10 timeout expires, and the
skb is moved to audit_retry_queue until auditd drains enough for delivery to
succeed. The record is not lost.
> > Second, regarding an audit-only fix: the symptom manifests as sk->sk_err
> > = ENOBUFS set inside netlink_overrun() (called from netlink_attachskb
> > when timeo == 0). Audit has no mechanism to prevent or clear this socket
> > state from the outside. Potential workarounds all fail:
> >
> > (1) Clearing sk_err after the fact is racy and affects other socket ops
>
> Why would you clear the sk_err, it's the reader's responsibility to
> clear the congestion and the reader is AFAIU a user space process.
The reader is in a fight to clear the congestion. But 1 reader thread vs 32
cores, the reader can get backlogged. It doesn't happen very often, but it
does once in a great while. The reader doesn't want an ENOBUFS and logs that
as an exceptional condition when that happens. It wants to rely on the
kernel's backlog mechanism.
> > (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism
>
> What's the anti-deadlock mechanism?
sk_sndtimeo = HZ/10, set in audit_net_init(). Without it, kauditd would sleep
indefinitely in netlink_attachskb if auditd is stalled or dead. The timeout
lets kauditd escape and route the skb to its retry queue.
> > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable
> > kernels where this bug is actively impacting users
>
> Which commit are you referring to? Isn't that flag itself ancient?
You're right, it is. I see how this flag would fix the pathological behavior
that was reported. But as I have looked at this suggestion, there seems to be
one wrinkle. User space should not need to know that the audit code in the
kernel has this retry mechanism. It seems like the audit subsystem should set
the flag on auditd's socket at registration time in auditd_set(). The kernel
is the right place for this because it's the kernel that manages the retry/
hold queues and sets the sk_sndtimeo that triggers the overrun path - auditd
has no knowledge of these internals.
NETLINK_F_RECV_NO_ENOBUFS and nlk_sk are private to net/netlink/af_netlink.h,
so audit.c can't set the flag directly. Should we propose a small exported
helper, netlink_sock_set_no_enobufs(), that mirrors the existing
setsockopt(NETLINK_NO_ENOBUFS) handler? Then the rest of the fix itself lives
entirely in kernel/audit.c as you suggested.
Something like:
void netlink_sock_set_no_enobufs(struct sock *sk)
{
struct netlink_sock *nlk = nlk_sk(sk);
nlk->flags |= NETLINK_F_RECV_NO_ENOBUFS;
clear_bit(NETLINK_S_CONGESTED, &nlk->state);
wake_up_interruptible(&nlk->wait);
}
and then in audit_set() it calls this as it sets up the connection. Is this
the way you wanted to handle this?
-Steve
On Thu, 28 May 2026 18:40:44 -0400 Steve Grubb wrote: > > > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable > > > kernels where this bug is actively impacting users > > > > Which commit are you referring to? Isn't that flag itself ancient? > > You're right, it is. I see how this flag would fix the pathological behavior > that was reported. But as I have looked at this suggestion, there seems to be > one wrinkle. User space should not need to know that the audit code in the > kernel has this retry mechanism. It's not about the retry mechanism, at least in my mind - I read your reply as "user space should not know that there was congestion". Why? It's not very useful, I get that, but user space can just clear the congestion signal and keep going. > It seems like the audit subsystem should set the flag on auditd's > socket at registration time in auditd_set(). The kernel is the right > place for this because it's the kernel that manages the retry/ hold > queues and sets the sk_sndtimeo that triggers the overrun path - > auditd has no knowledge of these internals. We have to carry this code somewhere, either in user space or in the kernel. I'd prefer not to carry it in the kernel.
Hello, On Thursday, May 28, 2026 7:29:01 PM Eastern Daylight Time Jakub Kicinski wrote: > On Thu, 28 May 2026 18:40:44 -0400 Steve Grubb wrote: > > > > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in > > > > stable > > > > kernels where this bug is actively impacting users > > > > > > Which commit are you referring to? Isn't that flag itself ancient? > > > > You're right, it is. I see how this flag would fix the pathological > > behavior that was reported. But as I have looked at this suggestion, > > there seems to be one wrinkle. User space should not need to know that > > the audit code in the kernel has this retry mechanism. > > It's not about the retry mechanism, at least in my mind - I read > your reply as "user space should not know that there was congestion". > Why? In the audit case, it is not useful. I know there can be an endless supply and there's not much that can be done except dequeueing what's next. > It's not very useful, I get that, but user space can just clear > the congestion signal and keep going. How? The recvfrom man page doesn't even discuss ENOBUFS. Which is one of the strongest arguments for a kernel side patch. The fact that there is exists a socket option to declare that you do not want ENOBUFS on netlink sockets is esoteric knowledge. The netlink(7) man page does cover the flag. But even where it discusses ENOBUFS, it does not mention that this is preventable by setting a socket option. I do appreciate this being pointed out. But getting from the recvfrom man page to a solution is not obvious. > > It seems like the audit subsystem should set the flag on auditd's > > socket at registration time in auditd_set(). The kernel is the right > > place for this because it's the kernel that manages the retry/ hold > > queues and sets the sk_sndtimeo that triggers the overrun path - > > auditd has no knowledge of these internals. > > We have to carry this code somewhere, either in user space or in > the kernel. I'd prefer not to carry it in the kernel. I can put this in the audit daemon. But whoever else writes a similar app will have to independently discover the same solution when faced with the pathologically bad behavior. A kernel side fix would have made it easier for future app developers to be successful. -Steve
On Tue, 09 Jun 2026 13:40:23 -0400 Steve Grubb wrote: > > > You're right, it is. I see how this flag would fix the pathological > > > behavior that was reported. But as I have looked at this suggestion, > > > there seems to be one wrinkle. User space should not need to know that > > > the audit code in the kernel has this retry mechanism. > > > > It's not about the retry mechanism, at least in my mind - I read > > your reply as "user space should not know that there was congestion". > > Why? > > In the audit case, it is not useful. I know there can be an endless supply > and there's not much that can be done except dequeueing what's next. > > > It's not very useful, I get that, but user space can just clear > > the congestion signal and keep going. > > How? The recvfrom man page doesn't even discuss ENOBUFS. Which is one of the > strongest arguments for a kernel side patch. The fact that there is exists a > socket option to declare that you do not want ENOBUFS on netlink sockets is > esoteric knowledge. The netlink(7) man page does cover the flag. But even > where it discusses ENOBUFS, it does not mention that this is preventable by > setting a socket option. I do appreciate this being pointed out. But getting > from the recvfrom man page to a solution is not obvious. socket errors are generally "consumed" when they are returned. The user space should see one ENOBUF and then once the rcvbuf is drained completely the CONGESTION bit should also get auto cleared. This is my mental model how Netlink works, LMK if you're seeing different behavior, my memory is faulty... > > > It seems like the audit subsystem should set the flag on auditd's > > > socket at registration time in auditd_set(). The kernel is the right > > > place for this because it's the kernel that manages the retry/ hold > > > queues and sets the sk_sndtimeo that triggers the overrun path - > > > auditd has no knowledge of these internals. > > > > We have to carry this code somewhere, either in user space or in > > the kernel. I'd prefer not to carry it in the kernel. > > I can put this in the audit daemon. But whoever else writes a similar app > will have to independently discover the same solution when faced with the > pathologically bad behavior. A kernel side fix would have made it easier for > future app developers to be successful.
Hello, On Tuesday, June 9, 2026 5:35:58 PM Eastern Daylight Time Jakub Kicinski wrote: > On Tue, 09 Jun 2026 13:40:23 -0400 Steve Grubb wrote: > > > > You're right, it is. I see how this flag would fix the pathological > > > > behavior that was reported. But as I have looked at this suggestion, > > > > there seems to be one wrinkle. User space should not need to know > > > > that > > > > the audit code in the kernel has this retry mechanism. > > > > > > It's not about the retry mechanism, at least in my mind - I read > > > your reply as "user space should not know that there was congestion". > > > Why? > > > > In the audit case, it is not useful. I know there can be an endless > > supply and there's not much that can be done except dequeueing what's > > next. > > > > > It's not very useful, I get that, but user space can just clear > > > the congestion signal and keep going. > > > > How? The recvfrom man page doesn't even discuss ENOBUFS. Which is one of > > the strongest arguments for a kernel side patch. The fact that there is > > exists a socket option to declare that you do not want ENOBUFS on > > netlink sockets is esoteric knowledge. The netlink(7) man page does > > cover the flag. But even where it discusses ENOBUFS, it does not mention > > that this is preventable by setting a socket option. I do appreciate > > this being pointed out. But getting from the recvfrom man page to a > > solution is not obvious. > > socket errors are generally "consumed" when they are returned. > The user space should see one ENOBUF It does. The man page is unhelpful. > and then once the rcvbuf is drained completely the CONGESTION bit should > also get auto cleared. This is my mental model how Netlink works, LMK if > you're seeing different behavior, my memory is faulty... Well, yes that is normal for other netlink subsystems. And it looks like we're not missing any magic cure. However, that congestion bit is what really causes a major headache for the audit system because it has it's own retry scaffolding. Auditd ack's each message so that the kernel side advances. In any event, I patched auditd to set NETLINK_NO_ENOBUFS. It's not the solution I hoped for, but if I understand what this does, it should solve our problem for auditd. Thanks, -Steve > > > > It seems like the audit subsystem should set the flag on auditd's > > > > socket at registration time in auditd_set(). The kernel is the right > > > > place for this because it's the kernel that manages the retry/ hold > > > > queues and sets the sk_sndtimeo that triggers the overrun path - > > > > auditd has no knowledge of these internals. > > > > > > We have to carry this code somewhere, either in user space or in > > > the kernel. I'd prefer not to carry it in the kernel. > > > > I can put this in the audit daemon. But whoever else writes a similar app > > will have to independently discover the same solution when faced with the > > pathologically bad behavior. A kernel side fix would have made it easier > > for future app developers to be successful.
On Mon, May 18, 2026 at 8:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote: > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on > > the netlink socket. > > Holding socket lock during slow IO sounds very wrong. One could say - > that's abuse of the socket lock? It's no different than any other kernel subsystem sending netlink packets to userspace, although in some configurations the rate at which audit sends netlink traffic is likely much higher than the majority of netlink users. Arguably, audit probably never should have used netlink, but that decision happened a long time ago and there were other issues complicating the decision. > > If the wait timeout fully expires (timeo == 0), > > netlink mistakenly interprets the zeroed timeout as a non-blocking > > request. It then triggers netlink_overrun that drops the event, > > completely bypassing the audit subsystem's internal retry queue, and > > falsely returns ENOBUFS to user-space, resulting in the following error: > > > > auditd[]: Error receiving audit netlink packet (No buffer space available) > > > > Fix this by detecting when a blocking sender's timeout has expired > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun > > on the next iteration), safely free the skb and return -EAGAIN, allowing > > the audit subsystem to gracefully enqueue the pending event into its > > internal backlog. > > The socket _is_ the queue, normally. There is a joke in there about audit and "normal", but I'll leave that as an exercise for the reader. I will say that audit has a lot of unique requirements regarding queue management and that dictates a lot of the wacky stuff audit has to do with it's record queue; the standard socket buffer functionality doesn't have everything, and I wouldn't want to ask for it to be augmented in a way that satisfies audit. > Please explore fixing this in audit? Ricardo, I was kinda hoping not to have to do this in audit, but I think you can probably get away with just open-coding netlink_unicast() in audit and then going from there ... we might want to do some other things differently, but let's see what a basic patch looks like before we spend a lot time redesigning it. -- paul-moore.com
On Tue, May 26, 2026 at 5:54 PM Paul Moore <paul@paul-moore.com> wrote: > > On Mon, May 18, 2026 at 8:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote: > > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on > > > the netlink socket. > > > > Holding socket lock during slow IO sounds very wrong. One could say - > > that's abuse of the socket lock? > > It's no different than any other kernel subsystem sending netlink > packets to userspace, although in some configurations the rate at > which audit sends netlink traffic is likely much higher than the > majority of netlink users. > > Arguably, audit probably never should have used netlink, but that > decision happened a long time ago and there were other issues > complicating the decision. > > > > If the wait timeout fully expires (timeo == 0), > > > netlink mistakenly interprets the zeroed timeout as a non-blocking > > > request. It then triggers netlink_overrun that drops the event, > > > completely bypassing the audit subsystem's internal retry queue, and > > > falsely returns ENOBUFS to user-space, resulting in the following error: > > > > > > auditd[]: Error receiving audit netlink packet (No buffer space available) > > > > > > Fix this by detecting when a blocking sender's timeout has expired > > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead > > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun > > > on the next iteration), safely free the skb and return -EAGAIN, allowing > > > the audit subsystem to gracefully enqueue the pending event into its > > > internal backlog. > > > > The socket _is_ the queue, normally. > > There is a joke in there about audit and "normal", but I'll leave that > as an exercise for the reader. I will say that audit has a lot of > unique requirements regarding queue management and that dictates a lot > of the wacky stuff audit has to do with it's record queue; the > standard socket buffer functionality doesn't have everything, and I > wouldn't want to ask for it to be augmented in a way that satisfies > audit. > > > Please explore fixing this in audit? > > Ricardo, I was kinda hoping not to have to do this in audit, but I > think you can probably get away with just open-coding > netlink_unicast() in audit and then going from there ... we might want > to do some other things differently, but let's see what a basic patch > looks like before we spend a lot time redesigning it. > > -- > paul-moore.com > Hi Paul, Thank you for your input here, I appreciate it. Before pursuing the open-coding approach, I wanted to let you know I have a v3 patch [1] ready that addresses the concerns raised before. It introduces NETLINK_UNICAST_TIMED as an explicit opt-in constant (value 2) rather than the broader heuristic in v2. Would you be open to reviewing this approach first? If you still prefer the open-coding route or if v3 gets pushed back, I'm happy to go that direction. [1] https://lore.kernel.org/audit/20260527192150.949400-1-rrobaina@redhat.com/T/#u Best regards, Ricardo
From: 'Simon Horman' <horms@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
netlink, audit: prevent false ENOBUFS on timeout expiry
This commit modifies netlink_unicast() to return -EAGAIN when a blocking
sender's timeout expires. This bypasses the netlink_overrun() call to
allow the audit subsystem to gracefully enqueue the pending event into its
internal backlog without falsely returning ENOBUFS to user-space.
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
> }
>
> err = netlink_attachskb(sk, skb, &timeo, ssk);
> - if (err == 1)
> + if (err == 1) {
> + /* timeo may have been zeroed by schedule_timeout inside
> + * netlink_attachskb. If the caller is a timed-blocking sender
> + * (not genuinely nonblocking), don't re-enter with timeo=0 as
> + * that would misfire netlink_overrun on the next iteration.
> + */
> + if (timeo == 0 && !nonblock) {
> + kfree_skb(skb);
> + return -EAGAIN;
Could this regression cause silent packet drops in non-audit subsystems?
While this change prevents false ENOBUFS for the audit subsystem, other
subsystems like InfiniBand's IWPM use rdma_nl_unicast_wait(), which sets
nonblock=0 and operates on a kernel socket with a timeout.
If the timeout expires because the receiving user-space daemon is
bottlenecked, this new path will free the skb and return -EAGAIN without
ever invoking netlink_overrun().
Without netlink_overrun(), the receiver's socket error is never set to
ENOBUFS. Would this prevent user-space from being notified of the dropped
message, leading to permanent state desynchronization between the kernel
and user-space?
> + }
> goto retry;
> + }
> if (err)
> return err;
On Mon, May 18, 2026 at 8:04 AM Simon Horman <horms@kernel.org> wrote:
>
> From: 'Simon Horman' <horms@kernel.org>
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> netlink, audit: prevent false ENOBUFS on timeout expiry
>
> This commit modifies netlink_unicast() to return -EAGAIN when a blocking
> sender's timeout expires. This bypasses the netlink_overrun() call to
> allow the audit subsystem to gracefully enqueue the pending event into its
> internal backlog without falsely returning ENOBUFS to user-space.
>
> > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> > --- a/net/netlink/af_netlink.c
> > +++ b/net/netlink/af_netlink.c
> > @@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
> > }
> >
> > err = netlink_attachskb(sk, skb, &timeo, ssk);
> > - if (err == 1)
> > + if (err == 1) {
> > + /* timeo may have been zeroed by schedule_timeout inside
> > + * netlink_attachskb. If the caller is a timed-blocking sender
> > + * (not genuinely nonblocking), don't re-enter with timeo=0 as
> > + * that would misfire netlink_overrun on the next iteration.
> > + */
> > + if (timeo == 0 && !nonblock) {
> > + kfree_skb(skb);
> > + return -EAGAIN;
>
> Could this regression cause silent packet drops in non-audit subsystems?
>
> While this change prevents false ENOBUFS for the audit subsystem, other
> subsystems like InfiniBand's IWPM use rdma_nl_unicast_wait(), which sets
> nonblock=0 and operates on a kernel socket with a timeout.
>
> If the timeout expires because the receiving user-space daemon is
> bottlenecked, this new path will free the skb and return -EAGAIN without
> ever invoking netlink_overrun().
>
> Without netlink_overrun(), the receiver's socket error is never set to
> ENOBUFS. Would this prevent user-space from being notified of the dropped
> message, leading to permanent state desynchronization between the kernel
> and user-space?
>
> > + }
> > goto retry;
> > + }
> > if (err)
> > return err;
>
Hi Simon,
Thanks for reviewing this patch!
You are correct that the timeo == 0 && !nonblock heuristic in v2 relies on
an implicit assumption about finite sk_sndtimeo. While RDMA/IWPM with
MAX_SCHEDULE_TIMEOUT would never reach this path in practice, your concern
correctly identifies that the heuristic is not surgical enough.
I've submitted v3 [1] with an explicit NETLINK_UNICAST_TIMED constant
(value 2). Callers must explicitly opt into this contract, leaving IWPM and
all other subsystems completely untouched:
if (timeo == 0 && nonblock == NETLINK_UNICAST_TIMED)
This ensures zero risk of silent drops or state desynchronization in other
subsystems. Does this address your concern?
[1] https://lore.kernel.org/audit/20260527192150.949400-1-rrobaina@redhat.com/T/#u
Best regards,
Ricardo
© 2016 - 2026 Red Hat, Inc.