[PATCH mptcp-next 0/8] split get_subflow interface into two

Geliang Tang posted 8 patches 2 weeks, 4 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/cover.1732153672.git.tanggeliang@kylinos.cn
include/net/mptcp.h                           |  5 +--
net/mptcp/bpf.c                               | 13 ++++++--
net/mptcp/sched.c                             | 33 ++++++++++++-------
.../testing/selftests/bpf/prog_tests/mptcp.c  | 14 ++++++--
.../selftests/bpf/progs/mptcp_bpf_bkup.c      | 24 +++++++++++---
.../selftests/bpf/progs/mptcp_bpf_burst.c     | 32 ++++++++----------
.../selftests/bpf/progs/mptcp_bpf_first.c     | 24 +++++++++++---
.../selftests/bpf/progs/mptcp_bpf_red.c       | 24 +++++++++++---
.../selftests/bpf/progs/mptcp_bpf_rr.c        | 24 +++++++++++---
9 files changed, 136 insertions(+), 57 deletions(-)
[PATCH mptcp-next 0/8] split get_subflow interface into two
Posted by Geliang Tang 2 weeks, 4 days ago
From: Geliang Tang <tanggeliang@kylinos.cn>

I got the following error when running bpf burst sched selftests in auto-btf-debug mode:

[  167.258293][   T49] BUG: sleeping function called from invalid context at net/core/sock.c:3652
[  167.258981][   T49] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 49, name: kworker/0:2
[  167.259547][   T49] preempt_count: 0, expected: 0
[  167.259826][   T49] RCU nest depth: 1, expected: 0
[  167.259975][   T49] 5 locks held by kworker/0:2/49:
[  167.260137][   T49]  #0: ffff888001130948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7e4/0x16b0
[  167.260741][   T49]  #1: ffffc90000347d90 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: process_one_work+0xdf9/0x16b0
[  167.261206][   T49]  #2: ffff88800c620e58 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_worker+0x7b/0xad0
[  167.261476][   T49]  #3: ffffffff8977e620 (rcu_read_lock){....}-{1:2}, at: __bpf_prog_enter+0x1f/0x170
[  167.261742][   T49]  #4: ffff88800b6f8258 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: bpf_prog_87e3c1bd65224f1c_bpf_burst_get_subflow+0x12c/0x58b
[  167.262309][   T49] CPU: 0 UID: 0 PID: 49 Comm: kworker/0:2 Tainted: G           OE      6.12.0-rc6+ #9
[  167.262591][   T49] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  167.262774][   T49] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  167.262950][   T49] Workqueue: events mptcp_worker
[  167.263097][   T49] Call Trace:
[  167.263211][   T49]  <TASK>
[  167.263301][   T49]  dump_stack_lvl+0x9e/0xe0
[  167.263453][   T49]  __might_resched+0x35d/0x590
[  167.263599][   T49]  ? __pfx___might_resched+0x10/0x10
[  167.263829][   T49]  __lock_sock_fast+0x2f/0xd0
[  167.264035][   T49]  mptcp_pm_nl_subflow_chk_stale+0x1e2/0x3c0
[  167.264304][   T49]  ? bpf_prog_87e3c1bd65224f1c_bpf_burst_get_subflow+0x12c/0x58b
[  167.264604][   T49]  bpf_prog_87e3c1bd65224f1c_bpf_burst_get_subflow+0x12c/0x58b
[  167.264995][   T49]  ? mptcp_sched_get_retrans+0x1f8/0x330
[  167.265215][   T49]  ? __pfx_mptcp_sched_get_retrans+0x10/0x10
[  167.265432][   T49]  ? mark_lock+0x371/0x3c0
[  167.265678][   T49]  ? __mptcp_retrans+0xf4/0x9a0
[  167.265973][   T49]  ? mptcp_worker+0x7b/0xad0

mptcp_pm_subflow_chk_stale is a sleeping function, it shouldn't be
invoked under BPF rcu_read_lock.

One solution is to set mptcp_pm_subflow_chk_stale with KF_SLEEPABLE
flag:

BTF_ID_FLAGS(func, mptcp_pm_subflow_chk_stale, KF_SLEEPABLE)

Then set bpf_burst_get_subflow with BPF_F_SLEEPABLE:

      err = bpf_program__set_flags(skel->progs.bpf_burst_get_subflow,
                                   BPF_F_SLEEPABLE);

But another error ocurrs:

[  101.732098][    C0] BUG: sleeping function called from invalid context at kernel/bpf/trampoline.c:968
[  101.732781][    C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1640, name: test_progs-no_a
[  101.733477][    C0] preempt_count: 303, expected: 0
[  101.733793][    C0] RCU nest depth: 2, expected: 0
[  101.734043][    C0] 6 locks held by test_progs-no_a/1640:
[  101.734187][    C0] #0: ffff888007526e58 (sk_lock-AF_INET){+.+.}-{0:0}, at: sk_wait_data (net/core/sock.c:3131 (discriminator 2)) 
[  101.734440][    C0] #1: ffffffff95b7e620 (rcu_read_lock){....}-{1:2}, at: process_backlog (include/linux/local_lock_internal.h:38 (discriminator 1) net/core/dev.c:6115 (discriminator 1)) 
[  101.734728][    C0] #2: ffffffff95b7e620 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish (include/linux/rcupdate.h:337 (discriminator 1) include/linux/rcupdate.h:849 (discriminator 1) net/ipv4/ip_input.c:232 (discriminator 1)) 
[  101.735103][    C0] #3: ffff888008eb8e58 (k-slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv (include/linux/skbuff.h:1670 include/net/tcp.h:2530 net/ipv4/tcp_ipv4.c:2348) 
[  101.735560][    C0] #4: ffff8880075261d8 (slock-AF_INET){+.-.}-{2:2}, at: mptcp_incoming_options (net/mptcp/options.c:1053 net/mptcp/options.c:1203) 
[  101.736023][    C0] #5: ffffffff95b7d980 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable (include/linux/rcupdate.h:337 (discriminator 1) include/linux/rcupdate_trace.h:57 (discriminator 1) kernel/bpf/trampoline.c:966 (discriminator 1)) 
[  101.736325][    C0] CPU: 0 UID: 0 PID: 1640 Comm: test_progs-no_a Tainted: G           OE      6.12.0-rc7+ #5
[  101.736676][    C0] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  101.736940][    C0] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  101.737146][    C0] Call Trace:
[  101.737259][    C0]  <IRQ>
[  101.737331][    C0] dump_stack_lvl (lib/dump_stack.c:123) 
[  101.737489][    C0] __might_resched (kernel/sched/core.c:8657) 
[  101.737625][    C0] ? __lock_acquire (kernel/locking/lockdep.c:5202 (discriminator 1)) 
[  101.737772][    C0] ? __pfx___might_resched (kernel/sched/core.c:8611) 
[  101.737919][    C0] __might_fault (mm/memory.c:6715 (discriminator 1)) 
[  101.738051][    C0] __bpf_prog_enter_sleepable (include/linux/bpf.h:2089 (discriminator 1) kernel/bpf/trampoline.c:970 (discriminator 1)) 
[  101.738195][    C0] ? __bpf_prog_enter_sleepable (include/linux/rcupdate.h:337 (discriminator 1) include/linux/rcupdate_trace.h:57 (discriminator 1) kernel/bpf/trampoline.c:966 (discriminator 1)) 
[  101.738357][    C0] ? mptcp_sched_get_send (net/mptcp/sched.c:183) 
[  101.738511][    C0] ? __pfx_mptcp_sched_get_send (net/mptcp/sched.c:158) 
[  101.738641][    C0] ? __bpf_tramp_enter (include/linux/rcupdate.h:347 (discriminator 1) include/linux/rcupdate.h:880 (discriminator 1) include/linux/percpu-refcount.h:209 (discriminator 1) include/linux/percpu-refcount.h:222 (discriminator 1) kernel/bpf/trampoline.c:1010 (discriminator 1)) 
[  101.738774][    C0] ? bpf_trampoline_6442561300+0x31/0xd7 
[  101.738911][    C0] ? mptcp_sched_get_send (net/mptcp/sched.c:158) 
[  101.739067][    C0] ? __mptcp_subflow_push_pending (net/mptcp/protocol.c:1682 (discriminator 1)) 
[  101.739231][    C0] ? __pfx___mptcp_subflow_push_pending (net/mptcp/protocol.c:1655) 
[  101.739378][    C0] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:107 (discriminator 1) include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 1) include/linux/atomic/atomic-instrumented.h:1302 (discriminator 1) include/asm-generic/qspinlock.h:111 (discriminator 1) kernel/locking/spinlock_debug.c:116 (discriminator 1)) 
[  101.739556][    C0] ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114) 
[  101.739743][    C0] ? mptcp_incoming_options (net/mptcp/options.c:1067 net/mptcp/options.c:1203) 

This means get_send() interface of this scheduler can't be set with
BPF_F_SLEEPABLE flag since it's invoked in ack_update_msk() under mptcp
data lock.

So this patch has to split get_subflow() interface of packet scheduer into
two interfaces: get_send() and get_retrans(). Then we can set get_retrans()
interface alone with BPF_F_SLEEPABLE flag.

Thanks to Barry for helping me solve this issue.
Cc: Barry Song <baohua@kernel.org>

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/529

Geliang Tang (8):
  mptcp: split get_subflow interface into two
  Squash to "selftests/bpf: Add bpf_first scheduler & test"
  Squash to "selftests/bpf: Add bpf_bkup scheduler & test"
  Squash to "selftests/bpf: Add bpf_rr scheduler & test"
  Squash to "selftests/bpf: Add bpf_red scheduler & test"
  Squash to "selftests/bpf: Add bpf_burst scheduler & test"
  Squash to "bpf: Export mptcp packet scheduler helpers"
  Squash to "selftests/bpf: Add bpf_burst scheduler & test"

 include/net/mptcp.h                           |  5 +--
 net/mptcp/bpf.c                               | 13 ++++++--
 net/mptcp/sched.c                             | 33 ++++++++++++-------
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 14 ++++++--
 .../selftests/bpf/progs/mptcp_bpf_bkup.c      | 24 +++++++++++---
 .../selftests/bpf/progs/mptcp_bpf_burst.c     | 32 ++++++++----------
 .../selftests/bpf/progs/mptcp_bpf_first.c     | 24 +++++++++++---
 .../selftests/bpf/progs/mptcp_bpf_red.c       | 24 +++++++++++---
 .../selftests/bpf/progs/mptcp_bpf_rr.c        | 24 +++++++++++---
 9 files changed, 136 insertions(+), 57 deletions(-)

-- 
2.43.0
Re: [PATCH mptcp-next 0/8] split get_subflow interface into two
Posted by MPTCP CI 2 weeks, 4 days ago
Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/11945771301

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/86555cf65164
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=911397


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)