[RFC PATCH] net/core: use wake_up_interruptible_poll() in sock_def_readable()

Xuewen Yan posted 1 patch 1 week, 6 days ago
net/core/sock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[RFC PATCH] net/core: use wake_up_interruptible_poll() in sock_def_readable()
Posted by Xuewen Yan 1 week, 6 days ago
sock_def_readable() currently uses wake_up_interruptible_sync_poll() to
wake up tasks waiting for readable data on a socket. The _sync variant
sets the WF_SYNC flag, which tells the scheduler that the waker will
schedule away soon, so the wakee should stay on the same CPU to avoid
needless cache bouncing.

However, we found that the following stack:
 -vfs_write
 -sock_write_iter
 -unix_stream_sendmsg
 -sock_def_readable
 -__wake_up_sync_key

In this process-context scenario, the waker does NOT go to sleep
after the wakeup. With WF_SYNC, the scheduler is misled into placing
the wakee on the waker's CPU (via wake_affine_idle()'s sync path when
nr_running == 1), causing both the sender and receiver to contend for
the same CPU. This may hurt throughput for IPC workloads on multi-core
systems where the sender and receiver could otherwise run in parallel
on different CPUs.

Switch to wake_up_interruptible_poll() which does not set WF_SYNC.
This allows the scheduler to freely migrate the wakee to an idle CPU,
enabling true parallelism between the sending and receiving processes.

Co-developed-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
Note:
The possible cost is that for softirq callers (TCP/UDP receive path), the wakee
may be migrated away from the current CPU where the received data is
cache-hot. However, this is mitigated by:
  - The scheduler's existing wake_affine logic which already considers
    cache affinity regardless of WF_SYNC.

We are not very familiar with the networking code here,
so we would greatly appreciate any suggestions or advice from the community.

Thanks!
---
 net/core/sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index b37b664b6eb9..42ab9373194f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3611,7 +3611,7 @@ void sock_def_readable(struct sock *sk)
 	rcu_read_lock();
 	wq = rcu_dereference(sk->sk_wq);
 	if (skwq_has_sleeper(wq))
-		wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
+		wake_up_interruptible_poll(&wq->wait, EPOLLIN | EPOLLPRI |
 						EPOLLRDNORM | EPOLLRDBAND);
 	sk_wake_async_rcu(sk, SOCK_WAKE_WAITD, POLL_IN);
 	rcu_read_unlock();
-- 
2.25.1

Re: [RFC PATCH] net/core: use wake_up_interruptible_poll() in sock_def_readable()
Posted by Xuewen Yan 4 days, 14 hours ago
Hello everyone,
Any comments about this?

Or we only ignore sync for thread:
---
diff --git a/net/core/sock.c b/net/core/sock.c
index b37b664b6eb9..a46334266e86 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3610,9 +3610,14 @@ void sock_def_readable(struct sock *sk)

        rcu_read_lock();
        wq = rcu_dereference(sk->sk_wq);
-       if (skwq_has_sleeper(wq))
-               wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
+       if (skwq_has_sleeper(wq)) {
+               if (in_interrupt())
+                       wake_up_interruptible_sync_poll(&wq->wait,
EPOLLIN | EPOLLPRI |
+                                               EPOLLRDNORM | EPOLLRDBAND);
+               else
+                       wake_up_interruptible_poll(&wq->wait, EPOLLIN
| EPOLLPRI |
                                                EPOLLRDNORM | EPOLLRDBAND);
+       }
        sk_wake_async_rcu(sk, SOCK_WAKE_WAITD, POLL_IN);
        rcu_read_unlock();
 }
@@ -3628,9 +3633,14 @@ static void sock_def_write_space(struct sock *sk)
         */
        if (sock_writeable(sk)) {
                wq = rcu_dereference(sk->sk_wq);
-               if (skwq_has_sleeper(wq))
-                       wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
+               if (skwq_has_sleeper(wq)) {
+                       if (in_interrupt())
+
wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
+                                               EPOLLWRNORM | EPOLLWRBAND);
+                       else
+                               wake_up_interruptible_poll(&wq->wait, EPOLLOUT |
                                                EPOLLWRNORM | EPOLLWRBAND);
+               }

                /* Should agree with poll, otherwise some programs break */
                sk_wake_async_rcu(sk, SOCK_WAKE_SPACE, POLL_OUT);

On Tue, May 26, 2026 at 2:37 PM Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>
> sock_def_readable() currently uses wake_up_interruptible_sync_poll() to
> wake up tasks waiting for readable data on a socket. The _sync variant
> sets the WF_SYNC flag, which tells the scheduler that the waker will
> schedule away soon, so the wakee should stay on the same CPU to avoid
> needless cache bouncing.
>
> However, we found that the following stack:
>  -vfs_write
>  -sock_write_iter
>  -unix_stream_sendmsg
>  -sock_def_readable
>  -__wake_up_sync_key
>
> In this process-context scenario, the waker does NOT go to sleep
> after the wakeup. With WF_SYNC, the scheduler is misled into placing
> the wakee on the waker's CPU (via wake_affine_idle()'s sync path when
> nr_running == 1), causing both the sender and receiver to contend for
> the same CPU. This may hurt throughput for IPC workloads on multi-core
> systems where the sender and receiver could otherwise run in parallel
> on different CPUs.
>
> Switch to wake_up_interruptible_poll() which does not set WF_SYNC.
> This allows the scheduler to freely migrate the wakee to an idle CPU,
> enabling true parallelism between the sending and receiving processes.
>
> Co-developed-by: Guohua Yan <guohua.yan@unisoc.com>
> Signed-off-by: Guohua Yan <guohua.yan@unisoc.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
> Note:
> The possible cost is that for softirq callers (TCP/UDP receive path), the wakee
> may be migrated away from the current CPU where the received data is
> cache-hot. However, this is mitigated by:
>   - The scheduler's existing wake_affine logic which already considers
>     cache affinity regardless of WF_SYNC.
>
> We are not very familiar with the networking code here,
> so we would greatly appreciate any suggestions or advice from the community.
>
> Thanks!
> ---
>  net/core/sock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b37b664b6eb9..42ab9373194f 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3611,7 +3611,7 @@ void sock_def_readable(struct sock *sk)
>         rcu_read_lock();
>         wq = rcu_dereference(sk->sk_wq);
>         if (skwq_has_sleeper(wq))
> -               wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
> +               wake_up_interruptible_poll(&wq->wait, EPOLLIN | EPOLLPRI |
>                                                 EPOLLRDNORM | EPOLLRDBAND);
>         sk_wake_async_rcu(sk, SOCK_WAKE_WAITD, POLL_IN);
>         rcu_read_unlock();
> --
> 2.25.1
>
Re: [RFC PATCH] net/core: use wake_up_interruptible_poll() in sock_def_readable()
Posted by Jiayuan Chen 1 week, 6 days ago
On 5/26/26 2:36 PM, Xuewen Yan wrote:
> sock_def_readable() currently uses wake_up_interruptible_sync_poll() to
> wake up tasks waiting for readable data on a socket. The _sync variant
> sets the WF_SYNC flag, which tells the scheduler that the waker will
> schedule away soon, so the wakee should stay on the same CPU to avoid
> needless cache bouncing.
>
> However, we found that the following stack:
>   -vfs_write
>   -sock_write_iter
>   -unix_stream_sendmsg
>   -sock_def_readable
>   -__wake_up_sync_key
>
> In this process-context scenario, the waker does NOT go to sleep
> after the wakeup. With WF_SYNC, the scheduler is misled into placing
> the wakee on the waker's CPU (via wake_affine_idle()'s sync path when
> nr_running == 1), causing both the sender and receiver to contend for
> the same CPU. This may hurt throughput for IPC workloads on multi-core
> systems where the sender and receiver could otherwise run in parallel
> on different CPUs.


WF_SYNC isn't a hard binding.

What's your test environment and benchmark ?
Re: [RFC PATCH] net/core: use wake_up_interruptible_poll() in sock_def_readable()
Posted by Xuewen Yan 1 week, 6 days ago
On Tue, May 26, 2026 at 5:08 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
>
> On 5/26/26 2:36 PM, Xuewen Yan wrote:
> > sock_def_readable() currently uses wake_up_interruptible_sync_poll() to
> > wake up tasks waiting for readable data on a socket. The _sync variant
> > sets the WF_SYNC flag, which tells the scheduler that the waker will
> > schedule away soon, so the wakee should stay on the same CPU to avoid
> > needless cache bouncing.
> >
> > However, we found that the following stack:
> >   -vfs_write
> >   -sock_write_iter
> >   -unix_stream_sendmsg
> >   -sock_def_readable
> >   -__wake_up_sync_key
> >
> > In this process-context scenario, the waker does NOT go to sleep
> > after the wakeup. With WF_SYNC, the scheduler is misled into placing
> > the wakee on the waker's CPU (via wake_affine_idle()'s sync path when
> > nr_running == 1), causing both the sender and receiver to contend for
> > the same CPU. This may hurt throughput for IPC workloads on multi-core
> > systems where the sender and receiver could otherwise run in parallel
> > on different CPUs.
>
>
> WF_SYNC isn't a hard binding.
>
> What's your test environment and benchmark ?
We tested the app installation speed on Android 17 with kernel 6.18.
After removing sync, we observed a significant improvement in
installation speed.

Thanks!