[PATCH bpf-next v5 0/2] Add wakeup_source iterators

Samuel Wu posted 2 patches 1 month ago
kernel/bpf/Makefile                           |   3 +
kernel/bpf/wakeup_source_iter.c               | 103 ++++++
tools/testing/selftests/bpf/config            |   1 +
.../bpf/prog_tests/wakeup_source_iter.c       | 300 ++++++++++++++++++
.../selftests/bpf/progs/wakeup_source_iter.c  |  80 +++++
5 files changed, 487 insertions(+)
create mode 100644 kernel/bpf/wakeup_source_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c
[PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Samuel Wu 1 month ago
This patch series introduces BPF iterators for wakeup_source, enabling
BPF programs to efficiently traverse a device's wakeup sources.

Currently, inspecting wakeup sources typically involves reading interfaces
like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the
sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and
each wakeup source have multiple stats, with one sysfs node per stat.
debugfs is unstable and insecure.

The iterators utilize pre-existing wakeup_sources_walk_* functions to
traverse over the SRCU that backs the list of wakeup_sources.

Changes in v5:
 - Add CORE definitions for *__local variables to fix s390 build per bpf-ci
 - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/

Changes in v4:
 - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci
 - Remove PM patch since it's already part of rebase
 - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/

Changes in v3:
 - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci
 - Simplify read of a struct's field in BPF program selftest per Andrii
 - Drop open coded iterators for wakeup_sources
 - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest
 - Read event_count instead of wakeup_count in selftest
 - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/

Changes in v2:
 - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors
 - Update copyright from 2025 to 2026
 - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/


Samuel Wu (2):
  bpf: Add wakeup_source iterator
  selftests/bpf: Add tests for wakeup_sources

 kernel/bpf/Makefile                           |   3 +
 kernel/bpf/wakeup_source_iter.c               | 103 ++++++
 tools/testing/selftests/bpf/config            |   1 +
 .../bpf/prog_tests/wakeup_source_iter.c       | 300 ++++++++++++++++++
 .../selftests/bpf/progs/wakeup_source_iter.c  |  80 +++++
 5 files changed, 487 insertions(+)
 create mode 100644 kernel/bpf/wakeup_source_iter.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c

-- 
2.53.0.473.g4a7958ca14-goog
Re: [PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Samuel Wu 4 weeks ago
On Wed, Feb 25, 2026 at 1:08 PM Samuel Wu <wusamuel@google.com> wrote:
>
> This patch series introduces BPF iterators for wakeup_source, enabling
> BPF programs to efficiently traverse a device's wakeup sources.
>
> Currently, inspecting wakeup sources typically involves reading interfaces
> like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the
> sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and
> each wakeup source have multiple stats, with one sysfs node per stat.
> debugfs is unstable and insecure.
>
> The iterators utilize pre-existing wakeup_sources_walk_* functions to
> traverse over the SRCU that backs the list of wakeup_sources.
>
> Changes in v5:
>  - Add CORE definitions for *__local variables to fix s390 build per bpf-ci
>  - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/
>
> Changes in v4:
>  - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci
>  - Remove PM patch since it's already part of rebase
>  - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/
>
> Changes in v3:
>  - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci
>  - Simplify read of a struct's field in BPF program selftest per Andrii
>  - Drop open coded iterators for wakeup_sources
>  - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest
>  - Read event_count instead of wakeup_count in selftest
>  - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/
>
> Changes in v2:
>  - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors
>  - Update copyright from 2025 to 2026
>  - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/
>
>
> Samuel Wu (2):
>   bpf: Add wakeup_source iterator
>   selftests/bpf: Add tests for wakeup_sources
>
>  kernel/bpf/Makefile                           |   3 +
>  kernel/bpf/wakeup_source_iter.c               | 103 ++++++
>  tools/testing/selftests/bpf/config            |   1 +
>  .../bpf/prog_tests/wakeup_source_iter.c       | 300 ++++++++++++++++++
>  .../selftests/bpf/progs/wakeup_source_iter.c  |  80 +++++
>  5 files changed, 487 insertions(+)
>  create mode 100644 kernel/bpf/wakeup_source_iter.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c
>  create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c
>
> --
> 2.53.0.473.g4a7958ca14-goog
>

Hi Andrii, any thoughts on this patchset? I've addressed your
suggestion from v2 of this patchset regarding selftest, but nothing
fundamental has changed since then.

Thanks!
Sam
Re: [PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Kumar Kartikeya Dwivedi 3 weeks, 5 days ago
On Thu, 5 Mar 2026 at 02:31, Samuel Wu <wusamuel@google.com> wrote:
>
> On Wed, Feb 25, 2026 at 1:08 PM Samuel Wu <wusamuel@google.com> wrote:
> >
> > This patch series introduces BPF iterators for wakeup_source, enabling
> > BPF programs to efficiently traverse a device's wakeup sources.
> >
> > Currently, inspecting wakeup sources typically involves reading interfaces
> > like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the
> > sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and
> > each wakeup source have multiple stats, with one sysfs node per stat.
> > debugfs is unstable and insecure.
> >
> > The iterators utilize pre-existing wakeup_sources_walk_* functions to
> > traverse over the SRCU that backs the list of wakeup_sources.
> >
> > Changes in v5:
> >  - Add CORE definitions for *__local variables to fix s390 build per bpf-ci
> >  - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/
> >
> > Changes in v4:
> >  - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci
> >  - Remove PM patch since it's already part of rebase
> >  - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/
> >
> > Changes in v3:
> >  - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci
> >  - Simplify read of a struct's field in BPF program selftest per Andrii
> >  - Drop open coded iterators for wakeup_sources
> >  - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest
> >  - Read event_count instead of wakeup_count in selftest
> >  - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/
> >
> > Changes in v2:
> >  - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors
> >  - Update copyright from 2025 to 2026
> >  - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/
> >
> >
> > Samuel Wu (2):
> >   bpf: Add wakeup_source iterator
> >   selftests/bpf: Add tests for wakeup_sources
> >
> >  kernel/bpf/Makefile                           |   3 +
> >  kernel/bpf/wakeup_source_iter.c               | 103 ++++++
> >  tools/testing/selftests/bpf/config            |   1 +
> >  .../bpf/prog_tests/wakeup_source_iter.c       | 300 ++++++++++++++++++
> >  .../selftests/bpf/progs/wakeup_source_iter.c  |  80 +++++
> >  5 files changed, 487 insertions(+)
> >  create mode 100644 kernel/bpf/wakeup_source_iter.c
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c
> >
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >
>
> Hi Andrii, any thoughts on this patchset? I've addressed your
> suggestion from v2 of this patchset regarding selftest, but nothing
> fundamental has changed since then.

I think it's still unclear why this cannot be achieved by exploring
what Andrii proposed initially in [0], i.e. by using existing
primitives and adding extra kfuncs where necessary to solve
bottlenecks, and I didn't see any indication where it was explored
sufficiently.

I think all we need is the wakeup source read_lock()/unlock() APIs as
KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock
with unlock by passing the index, which can be an opaque value
returned from the kfunc (e.g. dummy empty struct pointer encoding the
index). Then you can use bpf_for() to open-code the list iteration.
The iterator would become unnecessary, and everything can be done from
a syscall program's BPF_PROG_RUN invocation. I didn't
check how we can get the pointer to the wakeup_sources LIST_HEAD, but
you could try several options (__ksym,
bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return
pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()).

Also, how much speed up does this really provide? It is hard to
understand the motivation from the current cover letter. Is the
bottleneck due to opening multiple files in sysfs? You will still have
to read text and parse text from the iterator's output in your current
solution. Wouldn't it be faster if you just used a syscall program,
and used binary output (using a ring buffer) to stream data to user
space?

  [0]: https://lore.kernel.org/all/CAEf4BzZv9kCAmX0Fo4kR8qh9mu5NB-wLDmNAK_R=VzcxL7VPpQ@mail.gmail.com/

>
> Thanks!
> Sam
>
Re: [PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Kumar Kartikeya Dwivedi 3 weeks, 4 days ago
On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Thu, 5 Mar 2026 at 02:31, Samuel Wu <wusamuel@google.com> wrote:
> >
> > On Wed, Feb 25, 2026 at 1:08 PM Samuel Wu <wusamuel@google.com> wrote:
> > >
> > > This patch series introduces BPF iterators for wakeup_source, enabling
> > > BPF programs to efficiently traverse a device's wakeup sources.
> > >
> > > Currently, inspecting wakeup sources typically involves reading interfaces
> > > like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the
> > > sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and
> > > each wakeup source have multiple stats, with one sysfs node per stat.
> > > debugfs is unstable and insecure.
> > >
> > > The iterators utilize pre-existing wakeup_sources_walk_* functions to
> > > traverse over the SRCU that backs the list of wakeup_sources.
> > >
> > > Changes in v5:
> > >  - Add CORE definitions for *__local variables to fix s390 build per bpf-ci
> > >  - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/
> > >
> > > Changes in v4:
> > >  - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci
> > >  - Remove PM patch since it's already part of rebase
> > >  - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/
> > >
> > > Changes in v3:
> > >  - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci
> > >  - Simplify read of a struct's field in BPF program selftest per Andrii
> > >  - Drop open coded iterators for wakeup_sources
> > >  - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest
> > >  - Read event_count instead of wakeup_count in selftest
> > >  - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/
> > >
> > > Changes in v2:
> > >  - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors
> > >  - Update copyright from 2025 to 2026
> > >  - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/
> > >
> > >
> > > Samuel Wu (2):
> > >   bpf: Add wakeup_source iterator
> > >   selftests/bpf: Add tests for wakeup_sources
> > >
> > >  kernel/bpf/Makefile                           |   3 +
> > >  kernel/bpf/wakeup_source_iter.c               | 103 ++++++
> > >  tools/testing/selftests/bpf/config            |   1 +
> > >  .../bpf/prog_tests/wakeup_source_iter.c       | 300 ++++++++++++++++++
> > >  .../selftests/bpf/progs/wakeup_source_iter.c  |  80 +++++
> > >  5 files changed, 487 insertions(+)
> > >  create mode 100644 kernel/bpf/wakeup_source_iter.c
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c
> > >
> > > --
> > > 2.53.0.473.g4a7958ca14-goog
> > >
> >
> > Hi Andrii, any thoughts on this patchset? I've addressed your
> > suggestion from v2 of this patchset regarding selftest, but nothing
> > fundamental has changed since then.
>
> I think it's still unclear why this cannot be achieved by exploring
> what Andrii proposed initially in [0], i.e. by using existing
> primitives and adding extra kfuncs where necessary to solve
> bottlenecks, and I didn't see any indication where it was explored
> sufficiently.
>
> I think all we need is the wakeup source read_lock()/unlock() APIs as
> KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock
> with unlock by passing the index, which can be an opaque value
> returned from the kfunc (e.g. dummy empty struct pointer encoding the
> index). Then you can use bpf_for() to open-code the list iteration.

I had a longer discussion with Paul on this. When having multiple open
SRCU sections, mixing nested calls doesn't flatten the protection
scopes and leads to all kinds of issues, an example of which is in
[0]. That said, in this case, the verifier isn't tracking what the
underlying SRCU section protects or isn't even aware of it, all it
needs to ensure is that section is closed eventually, out-of-order
unlocks are ok, it would just exist to allow looking at the list in a
stable fashion using safe bpf_core_cast() accesses.

So, I just wanted to clarify why I think this should be perfectly ok.

  [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html

(if you are like me, please follow the link at the bottom of [0]
before feeling good about understanding the example).

> The iterator would become unnecessary, and everything can be done from
> a syscall program's BPF_PROG_RUN invocation. I didn't
> check how we can get the pointer to the wakeup_sources LIST_HEAD, but
> you could try several options (__ksym,
> bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return
> pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()).
>
> Also, how much speed up does this really provide? It is hard to
> understand the motivation from the current cover letter. Is the
> bottleneck due to opening multiple files in sysfs? You will still have
> to read text and parse text from the iterator's output in your current
> solution. Wouldn't it be faster if you just used a syscall program,
> and used binary output (using a ring buffer) to stream data to user
> space?
>
>   [0]: https://lore.kernel.org/all/CAEf4BzZv9kCAmX0Fo4kR8qh9mu5NB-wLDmNAK_R=VzcxL7VPpQ@mail.gmail.com/
>
> >
> > Thanks!
> > Sam
> >
Re: [PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Samuel Wu 3 weeks, 1 day ago
On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:

[...]

> > I think it's still unclear why this cannot be achieved by exploring
> > what Andrii proposed initially in [0], i.e. by using existing
> > primitives and adding extra kfuncs where necessary to solve
> > bottlenecks, and I didn't see any indication where it was explored
> > sufficiently.
> >
This is my bad, I misinterpreted/misunderstood Andrii's original
proposal to apply only to open-coded BPF iterators, hence the follow
up patches with them removed. However, I don't disagree this can be
achieved via open-coding. Both proposals are viable to traverse wakeup
sources via BPF, however I see two distinct advantages of the BPF
iterator:

1. Iterators provide a stable kernel interface that decouples the BPF
code from the kernel's internal data structures and refactors
2. If wakeup sources are accessible via BPF, the next step should be
to deprecate the existing interfaces for wakeup sources, such as the
sysfs and debugfs stats. In that case, a text-based BPF iterator
output will provide less friction during the transition.

> > I think all we need is the wakeup source read_lock()/unlock() APIs as
> > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock
> > with unlock by passing the index, which can be an opaque value
> > returned from the kfunc (e.g. dummy empty struct pointer encoding the
> > index). Then you can use bpf_for() to open-code the list iteration.
>
> I had a longer discussion with Paul on this. When having multiple open
> SRCU sections, mixing nested calls doesn't flatten the protection
> scopes and leads to all kinds of issues, an example of which is in
> [0]. That said, in this case, the verifier isn't tracking what the
> underlying SRCU section protects or isn't even aware of it, all it
> needs to ensure is that section is closed eventually, out-of-order
> unlocks are ok, it would just exist to allow looking at the list in a
> stable fashion using safe bpf_core_cast() accesses.
>
> So, I just wanted to clarify why I think this should be perfectly ok.
>
>   [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html
>
> (if you are like me, please follow the link at the bottom of [0]
> before feeling good about understanding the example).

Appreciate the thoughtful feedback from you and Paul. The example was
nuanced but well-explained by the post, however, a key difference I
see is that the BPF iterator solution creates continuous locking
segments during traversal, as opposed to hand-over-hand locking in the
example provided. I could be missing something fundamental here- do
you mind elaborating?

>
> > The iterator would become unnecessary, and everything can be done from
> > a syscall program's BPF_PROG_RUN invocation. I didn't
> > check how we can get the pointer to the wakeup_sources LIST_HEAD, but
> > you could try several options (__ksym,
> > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return
> > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()).
> >

Getting the static variable is key to the open-coded approach, but a
clean way forward isn't obvious to me. Passing around raw addresses
seems brittle and can raise security concerns. The wrapper kfunc to
get the static address might be the best option, but it feels like it
violates the power subsystem's encapsulation. The iterator approach
abstracts all of this cleanly.

> > Also, how much speed up does this really provide? It is hard to
> > understand the motivation from the current cover letter. Is the
> > bottleneck due to opening multiple files in sysfs? You will still have
> > to read text and parse text from the iterator's output in your current
> > solution. Wouldn't it be faster if you just used a syscall program,
> > and used binary output (using a ring buffer) to stream data to user
> > space?

With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm
seeing a 10x speedup when traversing wakeup sources; the speedup would
likely be even greater on a loaded system.

As alluded to earlier, the next step is to remove the
/sys/class/wakeup node and debugfs nodes. My apologies for not
explicitly listing this in the cover letter, but ultimately this would
save memory by removing those nodes, eliminate work by not creating
the nodes, and improve performance with a faster traversal, all while
transparently replacing legacy interfaces.

Thanks!
Sam

[...]
Re: [PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Kumar Kartikeya Dwivedi 3 weeks, 1 day ago
On Wed, 11 Mar 2026 at 01:23, Samuel Wu <wusamuel@google.com> wrote:
>
> On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> [...]
>
> > > I think it's still unclear why this cannot be achieved by exploring
> > > what Andrii proposed initially in [0], i.e. by using existing
> > > primitives and adding extra kfuncs where necessary to solve
> > > bottlenecks, and I didn't see any indication where it was explored
> > > sufficiently.
> > >
> This is my bad, I misinterpreted/misunderstood Andrii's original
> proposal to apply only to open-coded BPF iterators, hence the follow
> up patches with them removed. However, I don't disagree this can be
> achieved via open-coding. Both proposals are viable to traverse wakeup
> sources via BPF, however I see two distinct advantages of the BPF
> iterator:
>
> 1. Iterators provide a stable kernel interface that decouples the BPF
> code from the kernel's internal data structures and refactors

I think this is my concern; committing to a stable interface that
we'll have to maintain forever.
It's a narrow use case that can be better served by dedicated kfuncs
and rolling your own solution.
There will be no end to the number of iterators if we keep making one
for every linked list in the kernel.

It would be different if the synchronization scheme for the
wakeup_sources list was really complicated and hard to express inline
in the verifier, but that is not the case.

> 2. If wakeup sources are accessible via BPF, the next step should be
> to deprecate the existing interfaces for wakeup sources, such as the
> sysfs and debugfs stats. In that case, a text-based BPF iterator
> output will provide less friction during the transition.
>

I would agree if this was a one-to-one conversion, where you just
replace the directory for look up.
But you'll be reading from one file vs several, so it's already a
non-trivial change in the consumer.
At that point, the conversion burden doesn't have the same weight in
deciding whether to stick with textual representation.

> > > I think all we need is the wakeup source read_lock()/unlock() APIs as
> > > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock
> > > with unlock by passing the index, which can be an opaque value
> > > returned from the kfunc (e.g. dummy empty struct pointer encoding the
> > > index). Then you can use bpf_for() to open-code the list iteration.
> >
> > I had a longer discussion with Paul on this. When having multiple open
> > SRCU sections, mixing nested calls doesn't flatten the protection
> > scopes and leads to all kinds of issues, an example of which is in
> > [0]. That said, in this case, the verifier isn't tracking what the
> > underlying SRCU section protects or isn't even aware of it, all it
> > needs to ensure is that section is closed eventually, out-of-order
> > unlocks are ok, it would just exist to allow looking at the list in a
> > stable fashion using safe bpf_core_cast() accesses.
> >
> > So, I just wanted to clarify why I think this should be perfectly ok.
> >
> >   [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html
> >
> > (if you are like me, please follow the link at the bottom of [0]
> > before feeling good about understanding the example).
>
> Appreciate the thoughtful feedback from you and Paul. The example was
> nuanced but well-explained by the post, however, a key difference I
> see is that the BPF iterator solution creates continuous locking
> segments during traversal, as opposed to hand-over-hand locking in the
> example provided. I could be missing something fundamental here- do
> you mind elaborating?

It was just an illustration to identify what guarantees the verifier
would need to enforce.
The most we need to do is just ensure SRCU indices are passed back
during unlock, in whatever order.
Critical section will be using bpf_core_cast() anyway to do safe
walking of the list.
With the KF_ACQUIRE/KF_RELEASE kfuncs you will still have a continuous
locking segment.
Just enclose your list walking loop between lock() and unlock().

>
> >
> > > The iterator would become unnecessary, and everything can be done from
> > > a syscall program's BPF_PROG_RUN invocation. I didn't
> > > check how we can get the pointer to the wakeup_sources LIST_HEAD, but
> > > you could try several options (__ksym,
> > > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return
> > > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()).
> > >
>
> Getting the static variable is key to the open-coded approach, but a
> clean way forward isn't obvious to me. Passing around raw addresses
> seems brittle and can raise security concerns. The wrapper kfunc to

I don't understand the security concern. All of this happens inside
the BPF program.
If you're writing these addresses to user space and leaking them, it's your bug.
You can do that with iterators too. You have all the raw addresses
available to you.

> get the static address might be the best option, but it feels like it
> violates the power subsystem's encapsulation. The iterator approach
> abstracts all of this cleanly.
>
> > > Also, how much speed up does this really provide? It is hard to
> > > understand the motivation from the current cover letter. Is the
> > > bottleneck due to opening multiple files in sysfs? You will still have
> > > to read text and parse text from the iterator's output in your current
> > > solution. Wouldn't it be faster if you just used a syscall program,
> > > and used binary output (using a ring buffer) to stream data to user
> > > space?
>
> With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm
> seeing a 10x speedup when traversing wakeup sources; the speedup would
> likely be even greater on a loaded system.
>

I'm sure you can replicate it with the approach described above.
It might even be faster than parsing text from the file.

> As alluded to earlier, the next step is to remove the
> /sys/class/wakeup node and debugfs nodes. My apologies for not
> explicitly listing this in the cover letter, but ultimately this would
> save memory by removing those nodes, eliminate work by not creating
> the nodes, and improve performance with a faster traversal, all while
> transparently replacing legacy interfaces.

And you can do that regardless of the type of solution chosen above.

>
> Thanks!
> Sam
>
> [...]
Re: [PATCH bpf-next v5 0/2] Add wakeup_source iterators
Posted by Samuel Wu 2 weeks, 6 days ago
On Wed, Mar 11, 2026 at 2:06 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, 11 Mar 2026 at 01:23, Samuel Wu <wusamuel@google.com> wrote:
> >
> > On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > [...]
> >
> > > > I think it's still unclear why this cannot be achieved by exploring
> > > > what Andrii proposed initially in [0], i.e. by using existing
> > > > primitives and adding extra kfuncs where necessary to solve
> > > > bottlenecks, and I didn't see any indication where it was explored
> > > > sufficiently.
> > > >
> > This is my bad, I misinterpreted/misunderstood Andrii's original
> > proposal to apply only to open-coded BPF iterators, hence the follow
> > up patches with them removed. However, I don't disagree this can be
> > achieved via open-coding. Both proposals are viable to traverse wakeup
> > sources via BPF, however I see two distinct advantages of the BPF
> > iterator:
> >
> > 1. Iterators provide a stable kernel interface that decouples the BPF
> > code from the kernel's internal data structures and refactors
>
> I think this is my concern; committing to a stable interface that
> we'll have to maintain forever.
> It's a narrow use case that can be better served by dedicated kfuncs
> and rolling your own solution.
> There will be no end to the number of iterators if we keep making one
> for every linked list in the kernel.
>
> It would be different if the synchronization scheme for the
> wakeup_sources list was really complicated and hard to express inline
> in the verifier, but that is not the case.
>

I completely agree it's a viable approach, it seems to come down to
where/who carries the maintenance burden. Regardless, let me explore
how to prototype this cleanly to handle the details like the static
address, and then we will have something concrete for comparisons. I
appreciate the alternative suggestion and the feedback here.

> > 2. If wakeup sources are accessible via BPF, the next step should be
> > to deprecate the existing interfaces for wakeup sources, such as the
> > sysfs and debugfs stats. In that case, a text-based BPF iterator
> > output will provide less friction during the transition.
> >
>
> I would agree if this was a one-to-one conversion, where you just
> replace the directory for look up.
> But you'll be reading from one file vs several, so it's already a
> non-trivial change in the consumer.
> At that point, the conversion burden doesn't have the same weight in
> deciding whether to stick with textual representation.

My understanding is that most (all?) current consumers ultimately want
the aggregate list of wakeup source stats, not a single node in
isolation; the reason there are multiple nodes is to adhere to the
precedent that sysfs nodes contain a singular stat. A single
text-based output node with the aggregated stats would be a
straightforward change that simplifies userspace code consuming these
stats.

> > > > I think all we need is the wakeup source read_lock()/unlock() APIs as
> > > > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock
> > > > with unlock by passing the index, which can be an opaque value
> > > > returned from the kfunc (e.g. dummy empty struct pointer encoding the
> > > > index). Then you can use bpf_for() to open-code the list iteration.
> > >
> > > I had a longer discussion with Paul on this. When having multiple open
> > > SRCU sections, mixing nested calls doesn't flatten the protection
> > > scopes and leads to all kinds of issues, an example of which is in
> > > [0]. That said, in this case, the verifier isn't tracking what the
> > > underlying SRCU section protects or isn't even aware of it, all it
> > > needs to ensure is that section is closed eventually, out-of-order
> > > unlocks are ok, it would just exist to allow looking at the list in a
> > > stable fashion using safe bpf_core_cast() accesses.
> > >
> > > So, I just wanted to clarify why I think this should be perfectly ok.
> > >
> > >   [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html
> > >
> > > (if you are like me, please follow the link at the bottom of [0]
> > > before feeling good about understanding the example).
> >
> > Appreciate the thoughtful feedback from you and Paul. The example was
> > nuanced but well-explained by the post, however, a key difference I
> > see is that the BPF iterator solution creates continuous locking
> > segments during traversal, as opposed to hand-over-hand locking in the
> > example provided. I could be missing something fundamental here- do
> > you mind elaborating?
>
> It was just an illustration to identify what guarantees the verifier
> would need to enforce.
> The most we need to do is just ensure SRCU indices are passed back
> during unlock, in whatever order.
> Critical section will be using bpf_core_cast() anyway to do safe
> walking of the list.
> With the KF_ACQUIRE/KF_RELEASE kfuncs you will still have a continuous
> locking segment.
> Just enclose your list walking loop between lock() and unlock().
>
> >
> > >
> > > > The iterator would become unnecessary, and everything can be done from
> > > > a syscall program's BPF_PROG_RUN invocation. I didn't
> > > > check how we can get the pointer to the wakeup_sources LIST_HEAD, but
> > > > you could try several options (__ksym,
> > > > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return
> > > > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()).
> > > >
> >
> > Getting the static variable is key to the open-coded approach, but a
> > clean way forward isn't obvious to me. Passing around raw addresses
> > seems brittle and can raise security concerns. The wrapper kfunc to
>
> I don't understand the security concern. All of this happens inside
> the BPF program.
> If you're writing these addresses to user space and leaking them, it's your bug.
> You can do that with iterators too. You have all the raw addresses
> available to you.
>
> > get the static address might be the best option, but it feels like it
> > violates the power subsystem's encapsulation. The iterator approach
> > abstracts all of this cleanly.
> >
> > > > Also, how much speed up does this really provide? It is hard to
> > > > understand the motivation from the current cover letter. Is the
> > > > bottleneck due to opening multiple files in sysfs? You will still have
> > > > to read text and parse text from the iterator's output in your current
> > > > solution. Wouldn't it be faster if you just used a syscall program,
> > > > and used binary output (using a ring buffer) to stream data to user
> > > > space?
> >
> > With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm
> > seeing a 10x speedup when traversing wakeup sources; the speedup would
> > likely be even greater on a loaded system.
> >
>
> I'm sure you can replicate it with the approach described above.
> It might even be faster than parsing text from the file.
>
> > As alluded to earlier, the next step is to remove the
> > /sys/class/wakeup node and debugfs nodes. My apologies for not
> > explicitly listing this in the cover letter, but ultimately this would
> > save memory by removing those nodes, eliminate work by not creating
> > the nodes, and improve performance with a faster traversal, all while
> > transparently replacing legacy interfaces.
>
> And you can do that regardless of the type of solution chosen above.
>
> >
> > Thanks!
> > Sam
> >
> > [...]