kernel/bpf/Makefile | 3 + kernel/bpf/wakeup_source_iter.c | 103 ++++++ tools/testing/selftests/bpf/config | 1 + .../bpf/prog_tests/wakeup_source_iter.c | 300 ++++++++++++++++++ .../selftests/bpf/progs/wakeup_source_iter.c | 80 +++++ 5 files changed, 487 insertions(+) create mode 100644 kernel/bpf/wakeup_source_iter.c create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c
This patch series introduces BPF iterators for wakeup_source, enabling BPF programs to efficiently traverse a device's wakeup sources. Currently, inspecting wakeup sources typically involves reading interfaces like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and each wakeup source have multiple stats, with one sysfs node per stat. debugfs is unstable and insecure. The iterators utilize pre-existing wakeup_sources_walk_* functions to traverse over the SRCU that backs the list of wakeup_sources. Changes in v5: - Add CORE definitions for *__local variables to fix s390 build per bpf-ci - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/ Changes in v4: - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci - Remove PM patch since it's already part of rebase - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/ Changes in v3: - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci - Simplify read of a struct's field in BPF program selftest per Andrii - Drop open coded iterators for wakeup_sources - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest - Read event_count instead of wakeup_count in selftest - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/ Changes in v2: - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors - Update copyright from 2025 to 2026 - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/ Samuel Wu (2): bpf: Add wakeup_source iterator selftests/bpf: Add tests for wakeup_sources kernel/bpf/Makefile | 3 + kernel/bpf/wakeup_source_iter.c | 103 ++++++ tools/testing/selftests/bpf/config | 1 + .../bpf/prog_tests/wakeup_source_iter.c | 300 ++++++++++++++++++ .../selftests/bpf/progs/wakeup_source_iter.c | 80 +++++ 5 files changed, 487 insertions(+) create mode 100644 kernel/bpf/wakeup_source_iter.c create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c -- 2.53.0.473.g4a7958ca14-goog
On Wed, Feb 25, 2026 at 1:08 PM Samuel Wu <wusamuel@google.com> wrote: > > This patch series introduces BPF iterators for wakeup_source, enabling > BPF programs to efficiently traverse a device's wakeup sources. > > Currently, inspecting wakeup sources typically involves reading interfaces > like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the > sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and > each wakeup source have multiple stats, with one sysfs node per stat. > debugfs is unstable and insecure. > > The iterators utilize pre-existing wakeup_sources_walk_* functions to > traverse over the SRCU that backs the list of wakeup_sources. > > Changes in v5: > - Add CORE definitions for *__local variables to fix s390 build per bpf-ci > - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/ > > Changes in v4: > - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci > - Remove PM patch since it's already part of rebase > - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/ > > Changes in v3: > - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci > - Simplify read of a struct's field in BPF program selftest per Andrii > - Drop open coded iterators for wakeup_sources > - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest > - Read event_count instead of wakeup_count in selftest > - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/ > > Changes in v2: > - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors > - Update copyright from 2025 to 2026 > - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/ > > > Samuel Wu (2): > bpf: Add wakeup_source iterator > selftests/bpf: Add tests for wakeup_sources > > kernel/bpf/Makefile | 3 + > kernel/bpf/wakeup_source_iter.c | 103 ++++++ > tools/testing/selftests/bpf/config | 1 + > .../bpf/prog_tests/wakeup_source_iter.c | 300 ++++++++++++++++++ > .../selftests/bpf/progs/wakeup_source_iter.c | 80 +++++ > 5 files changed, 487 insertions(+) > create mode 100644 kernel/bpf/wakeup_source_iter.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c > create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c > > -- > 2.53.0.473.g4a7958ca14-goog > Hi Andrii, any thoughts on this patchset? I've addressed your suggestion from v2 of this patchset regarding selftest, but nothing fundamental has changed since then. Thanks! Sam
On Thu, 5 Mar 2026 at 02:31, Samuel Wu <wusamuel@google.com> wrote: > > On Wed, Feb 25, 2026 at 1:08 PM Samuel Wu <wusamuel@google.com> wrote: > > > > This patch series introduces BPF iterators for wakeup_source, enabling > > BPF programs to efficiently traverse a device's wakeup sources. > > > > Currently, inspecting wakeup sources typically involves reading interfaces > > like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the > > sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and > > each wakeup source have multiple stats, with one sysfs node per stat. > > debugfs is unstable and insecure. > > > > The iterators utilize pre-existing wakeup_sources_walk_* functions to > > traverse over the SRCU that backs the list of wakeup_sources. > > > > Changes in v5: > > - Add CORE definitions for *__local variables to fix s390 build per bpf-ci > > - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/ > > > > Changes in v4: > > - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci > > - Remove PM patch since it's already part of rebase > > - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/ > > > > Changes in v3: > > - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci > > - Simplify read of a struct's field in BPF program selftest per Andrii > > - Drop open coded iterators for wakeup_sources > > - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest > > - Read event_count instead of wakeup_count in selftest > > - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/ > > > > Changes in v2: > > - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors > > - Update copyright from 2025 to 2026 > > - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/ > > > > > > Samuel Wu (2): > > bpf: Add wakeup_source iterator > > selftests/bpf: Add tests for wakeup_sources > > > > kernel/bpf/Makefile | 3 + > > kernel/bpf/wakeup_source_iter.c | 103 ++++++ > > tools/testing/selftests/bpf/config | 1 + > > .../bpf/prog_tests/wakeup_source_iter.c | 300 ++++++++++++++++++ > > .../selftests/bpf/progs/wakeup_source_iter.c | 80 +++++ > > 5 files changed, 487 insertions(+) > > create mode 100644 kernel/bpf/wakeup_source_iter.c > > create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c > > create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c > > > > -- > > 2.53.0.473.g4a7958ca14-goog > > > > Hi Andrii, any thoughts on this patchset? I've addressed your > suggestion from v2 of this patchset regarding selftest, but nothing > fundamental has changed since then. I think it's still unclear why this cannot be achieved by exploring what Andrii proposed initially in [0], i.e. by using existing primitives and adding extra kfuncs where necessary to solve bottlenecks, and I didn't see any indication where it was explored sufficiently. I think all we need is the wakeup source read_lock()/unlock() APIs as KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock with unlock by passing the index, which can be an opaque value returned from the kfunc (e.g. dummy empty struct pointer encoding the index). Then you can use bpf_for() to open-code the list iteration. The iterator would become unnecessary, and everything can be done from a syscall program's BPF_PROG_RUN invocation. I didn't check how we can get the pointer to the wakeup_sources LIST_HEAD, but you could try several options (__ksym, bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()). Also, how much speed up does this really provide? It is hard to understand the motivation from the current cover letter. Is the bottleneck due to opening multiple files in sysfs? You will still have to read text and parse text from the iterator's output in your current solution. Wouldn't it be faster if you just used a syscall program, and used binary output (using a ring buffer) to stream data to user space? [0]: https://lore.kernel.org/all/CAEf4BzZv9kCAmX0Fo4kR8qh9mu5NB-wLDmNAK_R=VzcxL7VPpQ@mail.gmail.com/ > > Thanks! > Sam >
On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Thu, 5 Mar 2026 at 02:31, Samuel Wu <wusamuel@google.com> wrote: > > > > On Wed, Feb 25, 2026 at 1:08 PM Samuel Wu <wusamuel@google.com> wrote: > > > > > > This patch series introduces BPF iterators for wakeup_source, enabling > > > BPF programs to efficiently traverse a device's wakeup sources. > > > > > > Currently, inspecting wakeup sources typically involves reading interfaces > > > like /sys/class/wakeup/* or debugfs. The repeated syscalls to query the > > > sysfs nodes is inefficient, as there can be hundreds of wakeup_sources, and > > > each wakeup source have multiple stats, with one sysfs node per stat. > > > debugfs is unstable and insecure. > > > > > > The iterators utilize pre-existing wakeup_sources_walk_* functions to > > > traverse over the SRCU that backs the list of wakeup_sources. > > > > > > Changes in v5: > > > - Add CORE definitions for *__local variables to fix s390 build per bpf-ci > > > - v4 link: https://lore.kernel.org/all/20260211235055.239024-1-wusamuel@google.com/ > > > > > > Changes in v4: > > > - Proper init of variables in ws_iter_check_sleep_times() test per bpf-ci > > > - Remove PM patch since it's already part of rebase > > > - v3 link: https://lore.kernel.org/all/20260124012133.2451708-1-wusamuel@google.com/ > > > > > > Changes in v3: > > > - Update wakeup_sources_walk_start() to handle an empty list per bpf-ci > > > - Simplify read of a struct's field in BPF program selftest per Andrii > > > - Drop open coded iterators for wakeup_sources > > > - Fix condition from !get_ws_iter_stream to get_ws_iter_stream in selftest > > > - Read event_count instead of wakeup_count in selftest > > > - v2 link: https://lore.kernel.org/all/20260108225523.3268383-1-wusamuel@google.com/ > > > > > > Changes in v2: > > > - Guard BPF Makefile with CONFIG_PM_SLEEP to fix build errors > > > - Update copyright from 2025 to 2026 > > > - v1 link: https://lore.kernel.org/all/20251204025003.3162056-1-wusamuel@google.com/ > > > > > > > > > Samuel Wu (2): > > > bpf: Add wakeup_source iterator > > > selftests/bpf: Add tests for wakeup_sources > > > > > > kernel/bpf/Makefile | 3 + > > > kernel/bpf/wakeup_source_iter.c | 103 ++++++ > > > tools/testing/selftests/bpf/config | 1 + > > > .../bpf/prog_tests/wakeup_source_iter.c | 300 ++++++++++++++++++ > > > .../selftests/bpf/progs/wakeup_source_iter.c | 80 +++++ > > > 5 files changed, 487 insertions(+) > > > create mode 100644 kernel/bpf/wakeup_source_iter.c > > > create mode 100644 tools/testing/selftests/bpf/prog_tests/wakeup_source_iter.c > > > create mode 100644 tools/testing/selftests/bpf/progs/wakeup_source_iter.c > > > > > > -- > > > 2.53.0.473.g4a7958ca14-goog > > > > > > > Hi Andrii, any thoughts on this patchset? I've addressed your > > suggestion from v2 of this patchset regarding selftest, but nothing > > fundamental has changed since then. > > I think it's still unclear why this cannot be achieved by exploring > what Andrii proposed initially in [0], i.e. by using existing > primitives and adding extra kfuncs where necessary to solve > bottlenecks, and I didn't see any indication where it was explored > sufficiently. > > I think all we need is the wakeup source read_lock()/unlock() APIs as > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock > with unlock by passing the index, which can be an opaque value > returned from the kfunc (e.g. dummy empty struct pointer encoding the > index). Then you can use bpf_for() to open-code the list iteration. I had a longer discussion with Paul on this. When having multiple open SRCU sections, mixing nested calls doesn't flatten the protection scopes and leads to all kinds of issues, an example of which is in [0]. That said, in this case, the verifier isn't tracking what the underlying SRCU section protects or isn't even aware of it, all it needs to ensure is that section is closed eventually, out-of-order unlocks are ok, it would just exist to allow looking at the list in a stable fashion using safe bpf_core_cast() accesses. So, I just wanted to clarify why I think this should be perfectly ok. [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html (if you are like me, please follow the link at the bottom of [0] before feeling good about understanding the example). > The iterator would become unnecessary, and everything can be done from > a syscall program's BPF_PROG_RUN invocation. I didn't > check how we can get the pointer to the wakeup_sources LIST_HEAD, but > you could try several options (__ksym, > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()). > > Also, how much speed up does this really provide? It is hard to > understand the motivation from the current cover letter. Is the > bottleneck due to opening multiple files in sysfs? You will still have > to read text and parse text from the iterator's output in your current > solution. Wouldn't it be faster if you just used a syscall program, > and used binary output (using a ring buffer) to stream data to user > space? > > [0]: https://lore.kernel.org/all/CAEf4BzZv9kCAmX0Fo4kR8qh9mu5NB-wLDmNAK_R=VzcxL7VPpQ@mail.gmail.com/ > > > > > Thanks! > > Sam > >
On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: [...] > > I think it's still unclear why this cannot be achieved by exploring > > what Andrii proposed initially in [0], i.e. by using existing > > primitives and adding extra kfuncs where necessary to solve > > bottlenecks, and I didn't see any indication where it was explored > > sufficiently. > > This is my bad, I misinterpreted/misunderstood Andrii's original proposal to apply only to open-coded BPF iterators, hence the follow up patches with them removed. However, I don't disagree this can be achieved via open-coding. Both proposals are viable to traverse wakeup sources via BPF, however I see two distinct advantages of the BPF iterator: 1. Iterators provide a stable kernel interface that decouples the BPF code from the kernel's internal data structures and refactors 2. If wakeup sources are accessible via BPF, the next step should be to deprecate the existing interfaces for wakeup sources, such as the sysfs and debugfs stats. In that case, a text-based BPF iterator output will provide less friction during the transition. > > I think all we need is the wakeup source read_lock()/unlock() APIs as > > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock > > with unlock by passing the index, which can be an opaque value > > returned from the kfunc (e.g. dummy empty struct pointer encoding the > > index). Then you can use bpf_for() to open-code the list iteration. > > I had a longer discussion with Paul on this. When having multiple open > SRCU sections, mixing nested calls doesn't flatten the protection > scopes and leads to all kinds of issues, an example of which is in > [0]. That said, in this case, the verifier isn't tracking what the > underlying SRCU section protects or isn't even aware of it, all it > needs to ensure is that section is closed eventually, out-of-order > unlocks are ok, it would just exist to allow looking at the list in a > stable fashion using safe bpf_core_cast() accesses. > > So, I just wanted to clarify why I think this should be perfectly ok. > > [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html > > (if you are like me, please follow the link at the bottom of [0] > before feeling good about understanding the example). Appreciate the thoughtful feedback from you and Paul. The example was nuanced but well-explained by the post, however, a key difference I see is that the BPF iterator solution creates continuous locking segments during traversal, as opposed to hand-over-hand locking in the example provided. I could be missing something fundamental here- do you mind elaborating? > > > The iterator would become unnecessary, and everything can be done from > > a syscall program's BPF_PROG_RUN invocation. I didn't > > check how we can get the pointer to the wakeup_sources LIST_HEAD, but > > you could try several options (__ksym, > > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return > > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()). > > Getting the static variable is key to the open-coded approach, but a clean way forward isn't obvious to me. Passing around raw addresses seems brittle and can raise security concerns. The wrapper kfunc to get the static address might be the best option, but it feels like it violates the power subsystem's encapsulation. The iterator approach abstracts all of this cleanly. > > Also, how much speed up does this really provide? It is hard to > > understand the motivation from the current cover letter. Is the > > bottleneck due to opening multiple files in sysfs? You will still have > > to read text and parse text from the iterator's output in your current > > solution. Wouldn't it be faster if you just used a syscall program, > > and used binary output (using a ring buffer) to stream data to user > > space? With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm seeing a 10x speedup when traversing wakeup sources; the speedup would likely be even greater on a loaded system. As alluded to earlier, the next step is to remove the /sys/class/wakeup node and debugfs nodes. My apologies for not explicitly listing this in the cover letter, but ultimately this would save memory by removing those nodes, eliminate work by not creating the nodes, and improve performance with a faster traversal, all while transparently replacing legacy interfaces. Thanks! Sam [...]
On Wed, 11 Mar 2026 at 01:23, Samuel Wu <wusamuel@google.com> wrote: > > On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > [...] > > > > I think it's still unclear why this cannot be achieved by exploring > > > what Andrii proposed initially in [0], i.e. by using existing > > > primitives and adding extra kfuncs where necessary to solve > > > bottlenecks, and I didn't see any indication where it was explored > > > sufficiently. > > > > This is my bad, I misinterpreted/misunderstood Andrii's original > proposal to apply only to open-coded BPF iterators, hence the follow > up patches with them removed. However, I don't disagree this can be > achieved via open-coding. Both proposals are viable to traverse wakeup > sources via BPF, however I see two distinct advantages of the BPF > iterator: > > 1. Iterators provide a stable kernel interface that decouples the BPF > code from the kernel's internal data structures and refactors I think this is my concern; committing to a stable interface that we'll have to maintain forever. It's a narrow use case that can be better served by dedicated kfuncs and rolling your own solution. There will be no end to the number of iterators if we keep making one for every linked list in the kernel. It would be different if the synchronization scheme for the wakeup_sources list was really complicated and hard to express inline in the verifier, but that is not the case. > 2. If wakeup sources are accessible via BPF, the next step should be > to deprecate the existing interfaces for wakeup sources, such as the > sysfs and debugfs stats. In that case, a text-based BPF iterator > output will provide less friction during the transition. > I would agree if this was a one-to-one conversion, where you just replace the directory for look up. But you'll be reading from one file vs several, so it's already a non-trivial change in the consumer. At that point, the conversion burden doesn't have the same weight in deciding whether to stick with textual representation. > > > I think all we need is the wakeup source read_lock()/unlock() APIs as > > > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock > > > with unlock by passing the index, which can be an opaque value > > > returned from the kfunc (e.g. dummy empty struct pointer encoding the > > > index). Then you can use bpf_for() to open-code the list iteration. > > > > I had a longer discussion with Paul on this. When having multiple open > > SRCU sections, mixing nested calls doesn't flatten the protection > > scopes and leads to all kinds of issues, an example of which is in > > [0]. That said, in this case, the verifier isn't tracking what the > > underlying SRCU section protects or isn't even aware of it, all it > > needs to ensure is that section is closed eventually, out-of-order > > unlocks are ok, it would just exist to allow looking at the list in a > > stable fashion using safe bpf_core_cast() accesses. > > > > So, I just wanted to clarify why I think this should be perfectly ok. > > > > [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html > > > > (if you are like me, please follow the link at the bottom of [0] > > before feeling good about understanding the example). > > Appreciate the thoughtful feedback from you and Paul. The example was > nuanced but well-explained by the post, however, a key difference I > see is that the BPF iterator solution creates continuous locking > segments during traversal, as opposed to hand-over-hand locking in the > example provided. I could be missing something fundamental here- do > you mind elaborating? It was just an illustration to identify what guarantees the verifier would need to enforce. The most we need to do is just ensure SRCU indices are passed back during unlock, in whatever order. Critical section will be using bpf_core_cast() anyway to do safe walking of the list. With the KF_ACQUIRE/KF_RELEASE kfuncs you will still have a continuous locking segment. Just enclose your list walking loop between lock() and unlock(). > > > > > > The iterator would become unnecessary, and everything can be done from > > > a syscall program's BPF_PROG_RUN invocation. I didn't > > > check how we can get the pointer to the wakeup_sources LIST_HEAD, but > > > you could try several options (__ksym, > > > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return > > > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()). > > > > > Getting the static variable is key to the open-coded approach, but a > clean way forward isn't obvious to me. Passing around raw addresses > seems brittle and can raise security concerns. The wrapper kfunc to I don't understand the security concern. All of this happens inside the BPF program. If you're writing these addresses to user space and leaking them, it's your bug. You can do that with iterators too. You have all the raw addresses available to you. > get the static address might be the best option, but it feels like it > violates the power subsystem's encapsulation. The iterator approach > abstracts all of this cleanly. > > > > Also, how much speed up does this really provide? It is hard to > > > understand the motivation from the current cover letter. Is the > > > bottleneck due to opening multiple files in sysfs? You will still have > > > to read text and parse text from the iterator's output in your current > > > solution. Wouldn't it be faster if you just used a syscall program, > > > and used binary output (using a ring buffer) to stream data to user > > > space? > > With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm > seeing a 10x speedup when traversing wakeup sources; the speedup would > likely be even greater on a loaded system. > I'm sure you can replicate it with the approach described above. It might even be faster than parsing text from the file. > As alluded to earlier, the next step is to remove the > /sys/class/wakeup node and debugfs nodes. My apologies for not > explicitly listing this in the cover letter, but ultimately this would > save memory by removing those nodes, eliminate work by not creating > the nodes, and improve performance with a faster traversal, all while > transparently replacing legacy interfaces. And you can do that regardless of the type of solution chosen above. > > Thanks! > Sam > > [...]
On Wed, Mar 11, 2026 at 2:06 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Wed, 11 Mar 2026 at 01:23, Samuel Wu <wusamuel@google.com> wrote: > > > > On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > > > On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > [...] > > > > > > I think it's still unclear why this cannot be achieved by exploring > > > > what Andrii proposed initially in [0], i.e. by using existing > > > > primitives and adding extra kfuncs where necessary to solve > > > > bottlenecks, and I didn't see any indication where it was explored > > > > sufficiently. > > > > > > This is my bad, I misinterpreted/misunderstood Andrii's original > > proposal to apply only to open-coded BPF iterators, hence the follow > > up patches with them removed. However, I don't disagree this can be > > achieved via open-coding. Both proposals are viable to traverse wakeup > > sources via BPF, however I see two distinct advantages of the BPF > > iterator: > > > > 1. Iterators provide a stable kernel interface that decouples the BPF > > code from the kernel's internal data structures and refactors > > I think this is my concern; committing to a stable interface that > we'll have to maintain forever. > It's a narrow use case that can be better served by dedicated kfuncs > and rolling your own solution. > There will be no end to the number of iterators if we keep making one > for every linked list in the kernel. > > It would be different if the synchronization scheme for the > wakeup_sources list was really complicated and hard to express inline > in the verifier, but that is not the case. > I completely agree it's a viable approach, it seems to come down to where/who carries the maintenance burden. Regardless, let me explore how to prototype this cleanly to handle the details like the static address, and then we will have something concrete for comparisons. I appreciate the alternative suggestion and the feedback here. > > 2. If wakeup sources are accessible via BPF, the next step should be > > to deprecate the existing interfaces for wakeup sources, such as the > > sysfs and debugfs stats. In that case, a text-based BPF iterator > > output will provide less friction during the transition. > > > > I would agree if this was a one-to-one conversion, where you just > replace the directory for look up. > But you'll be reading from one file vs several, so it's already a > non-trivial change in the consumer. > At that point, the conversion burden doesn't have the same weight in > deciding whether to stick with textual representation. My understanding is that most (all?) current consumers ultimately want the aggregate list of wakeup source stats, not a single node in isolation; the reason there are multiple nodes is to adhere to the precedent that sysfs nodes contain a singular stat. A single text-based output node with the aggregated stats would be a straightforward change that simplifies userspace code consuming these stats. > > > > I think all we need is the wakeup source read_lock()/unlock() APIs as > > > > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock > > > > with unlock by passing the index, which can be an opaque value > > > > returned from the kfunc (e.g. dummy empty struct pointer encoding the > > > > index). Then you can use bpf_for() to open-code the list iteration. > > > > > > I had a longer discussion with Paul on this. When having multiple open > > > SRCU sections, mixing nested calls doesn't flatten the protection > > > scopes and leads to all kinds of issues, an example of which is in > > > [0]. That said, in this case, the verifier isn't tracking what the > > > underlying SRCU section protects or isn't even aware of it, all it > > > needs to ensure is that section is closed eventually, out-of-order > > > unlocks are ok, it would just exist to allow looking at the list in a > > > stable fashion using safe bpf_core_cast() accesses. > > > > > > So, I just wanted to clarify why I think this should be perfectly ok. > > > > > > [0]: https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html > > > > > > (if you are like me, please follow the link at the bottom of [0] > > > before feeling good about understanding the example). > > > > Appreciate the thoughtful feedback from you and Paul. The example was > > nuanced but well-explained by the post, however, a key difference I > > see is that the BPF iterator solution creates continuous locking > > segments during traversal, as opposed to hand-over-hand locking in the > > example provided. I could be missing something fundamental here- do > > you mind elaborating? > > It was just an illustration to identify what guarantees the verifier > would need to enforce. > The most we need to do is just ensure SRCU indices are passed back > during unlock, in whatever order. > Critical section will be using bpf_core_cast() anyway to do safe > walking of the list. > With the KF_ACQUIRE/KF_RELEASE kfuncs you will still have a continuous > locking segment. > Just enclose your list walking loop between lock() and unlock(). > > > > > > > > > > The iterator would become unnecessary, and everything can be done from > > > > a syscall program's BPF_PROG_RUN invocation. I didn't > > > > check how we can get the pointer to the wakeup_sources LIST_HEAD, but > > > > you could try several options (__ksym, > > > > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return > > > > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()). > > > > > > > > Getting the static variable is key to the open-coded approach, but a > > clean way forward isn't obvious to me. Passing around raw addresses > > seems brittle and can raise security concerns. The wrapper kfunc to > > I don't understand the security concern. All of this happens inside > the BPF program. > If you're writing these addresses to user space and leaking them, it's your bug. > You can do that with iterators too. You have all the raw addresses > available to you. > > > get the static address might be the best option, but it feels like it > > violates the power subsystem's encapsulation. The iterator approach > > abstracts all of this cleanly. > > > > > > Also, how much speed up does this really provide? It is hard to > > > > understand the motivation from the current cover letter. Is the > > > > bottleneck due to opening multiple files in sysfs? You will still have > > > > to read text and parse text from the iterator's output in your current > > > > solution. Wouldn't it be faster if you just used a syscall program, > > > > and used binary output (using a ring buffer) to stream data to user > > > > space? > > > > With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm > > seeing a 10x speedup when traversing wakeup sources; the speedup would > > likely be even greater on a loaded system. > > > > I'm sure you can replicate it with the approach described above. > It might even be faster than parsing text from the file. > > > As alluded to earlier, the next step is to remove the > > /sys/class/wakeup node and debugfs nodes. My apologies for not > > explicitly listing this in the cover letter, but ultimately this would > > save memory by removing those nodes, eliminate work by not creating > > the nodes, and improve performance with a faster traversal, all while > > transparently replacing legacy interfaces. > > And you can do that regardless of the type of solution chosen above. > > > > > Thanks! > > Sam > > > > [...]
© 2016 - 2026 Red Hat, Inc.