io_uring/fdinfo.c | 14 ++++++++++++++ io_uring/timeout.c | 12 ------------ io_uring/timeout.h | 12 ++++++++++++ 3 files changed, 26 insertions(+), 12 deletions(-)
io_uring fdinfo contains most of the runtime information,which is
helpful for debugging io_uring applications; However, there is
currently a lack of timeout-related information, and this patch adds
timeout_list information.
--
changes since v1:
- use _irq version spin_lock.
- Fixed formatting issues and delete redundant code.
- v1 :https://lore.kernel.org/io-uring/20240812020052.8763-1-ruyi.zhang@samsung.com/
--
Signed-off-by: Ruyi Zhang <ruyi.zhang@samsung.com>
---
io_uring/fdinfo.c | 14 ++++++++++++++
io_uring/timeout.c | 12 ------------
io_uring/timeout.h | 12 ++++++++++++
3 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index d43e1b5fcb36..f524c3cd6f57 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -14,6 +14,7 @@
#include "fdinfo.h"
#include "cancel.h"
#include "rsrc.h"
+#include "timeout.h"
#ifdef CONFIG_PROC_FS
static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
@@ -55,6 +56,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
struct io_ring_ctx *ctx = file->private_data;
struct io_overflow_cqe *ocqe;
struct io_rings *r = ctx->rings;
+ struct io_timeout *timeout;
struct rusage sq_usage;
unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
unsigned int sq_head = READ_ONCE(r->sq.head);
@@ -235,5 +237,17 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
seq_puts(m, "NAPI:\tdisabled\n");
}
#endif
+
+ seq_puts(m, "TimeoutList:\n");
+ spin_lock_irq(&ctx->timeout_lock);
+ list_for_each_entry(timeout, &ctx->timeout_list, list) {
+ struct io_timeout_data *data;
+
+ data = cmd_to_io_kiocb(timeout)->async_data;
+ seq_printf(m, " off=%u, repeats=%u, sec=%lld, nsec=%ld\n",
+ timeout->off, timeout->repeats, data->ts.tv_sec,
+ data->ts.tv_nsec);
+ }
+ spin_unlock_irq(&ctx->timeout_lock);
}
#endif
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 9973876d91b0..4449e139e371 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -13,18 +13,6 @@
#include "cancel.h"
#include "timeout.h"
-struct io_timeout {
- struct file *file;
- u32 off;
- u32 target_seq;
- u32 repeats;
- struct list_head list;
- /* head of the link, used by linked timeouts only */
- struct io_kiocb *head;
- /* for linked completions */
- struct io_kiocb *prev;
-};
-
struct io_timeout_rem {
struct file *file;
u64 addr;
diff --git a/io_uring/timeout.h b/io_uring/timeout.h
index a6939f18313e..befd489a6286 100644
--- a/io_uring/timeout.h
+++ b/io_uring/timeout.h
@@ -1,5 +1,17 @@
// SPDX-License-Identifier: GPL-2.0
+struct io_timeout {
+ struct file *file;
+ u32 off;
+ u32 target_seq;
+ u32 repeats;
+ struct list_head list;
+ /* head of the link, used by linked timeouts only */
+ struct io_kiocb *head;
+ /* for linked completions */
+ struct io_kiocb *prev;
+};
+
struct io_timeout_data {
struct io_kiocb *req;
struct hrtimer timer;
--
2.43.0
On 9/25/24 09:58, Ruyi Zhang wrote: > io_uring fdinfo contains most of the runtime information,which is > helpful for debugging io_uring applications; However, there is > currently a lack of timeout-related information, and this patch adds > timeout_list information. Please refer to unaddressed comments from v1. We can't have irqs disabled for that long. And it's too verbose (i.e. depends on the number of timeouts). > -- > changes since v1: > - use _irq version spin_lock. > - Fixed formatting issues and delete redundant code. > - v1 :https://lore.kernel.org/io-uring/20240812020052.8763-1-ruyi.zhang@samsung.com/ > -- > > Signed-off-by: Ruyi Zhang <ruyi.zhang@samsung.com> -- Pavel Begunkov
--- On 25 Sep 2024 12:58 Pavel Begunkov wrote > On 9/25/24 09:58, Ruyi Zhang wrote: >> io_uring fdinfo contains most of the runtime information,which is >> helpful for debugging io_uring applications; However, there is >> currently a lack of timeout-related information, and this patch adds >> timeout_list information. > Please refer to unaddressed comments from v1. We can't have irqs > disabled for that long. And it's too verbose (i.e. depends on > the number of timeouts). Two questions: 1. I agree with you, we shouldn't walk a potentially very long list under spinlock. but i can't find any other way to get all the timeout information than to walk the timeout_list. Do you have any good ideas? 2. I also agree seq_printf heavier, if we use seq_put_decimal_ull and seq_puts to concatenate strings, I haven't tested whether it's more efficient or not, but the code is certainly not as readable as the former. It's also possible that I don't fully understand what you mean and want to hear your opinion. --- Ruyi Zhang
On 10/10/24 10:20, Ruyi Zhang wrote: > --- > On 25 Sep 2024 12:58 Pavel Begunkov wrote >> On 9/25/24 09:58, Ruyi Zhang wrote: >>> io_uring fdinfo contains most of the runtime information,which is >>> helpful for debugging io_uring applications; However, there is >>> currently a lack of timeout-related information, and this patch adds >>> timeout_list information. > >> Please refer to unaddressed comments from v1. We can't have irqs >> disabled for that long. And it's too verbose (i.e. depends on >> the number of timeouts). > > Two questions: > > 1. I agree with you, we shouldn't walk a potentially very long list > under spinlock. but i can't find any other way to get all the timeout If only it's just under the spin, but with disabled irqs... > information than to walk the timeout_list. Do you have any good ideas? In the long run it'd be great to replace the spinlock with a mutex, i.e. just ->uring_lock, but that would might be a bit involving as need to move handling to the task context. > 2. I also agree seq_printf heavier, if we use seq_put_decimal_ull and > seq_puts to concatenate strings, I haven't tested whether it's more > efficient or not, but the code is certainly not as readable as the > former. It's also possible that I don't fully understand what you mean > and want to hear your opinion. I don't think there is any difference, it'd be a matter of doubling the number of in flight timeouts to achieve same timings. Tell me, do you really have a good case where you need that (pretty verbose)? Why not drgn / bpftrace it out of the kernel instead? -- Pavel Begunkov
--- On 2024-10-10 15:35 Pavel Begunkov wrote: >> Two questions: >> >> 1. I agree with you, we shouldn't walk a potentially very >> long list under spinlock. but i can't find any other way >> to get all the timeout > If only it's just under the spin, but with disabled irqs... >> information than to walk the timeout_list. Do you have any >> good ideas? > In the long run it'd be great to replace the spinlock > with a mutex, i.e. just ->uring_lock, but that would might be > a bit involving as need to move handling to the task context. Yes, it makes more sense to replace spin_lock, but that would require other related logic to be modified, and I don't think it's wise to do that for the sake of a piece of debugging information. >> 2. I also agree seq_printf heavier, if we use >> seq_put_decimal_ull and seq_puts to concatenate strings, >> I haven't tested whether it's more efficient or not, but >> the code is certainly not as readable as the former. It's >> also possible that I don't fully understand what you mean >> and want to hear your opinion. > I don't think there is any difference, it'd be a matter of > doubling the number of in flight timeouts to achieve same > timings. Tell me, do you really have a good case where you > need that (pretty verbose)? Why not drgn / bpftrace it out > of the kernel instead? Of course, this information is available through existing tools. But I think that most of the io_uring metadata has been exported from the fdinfo file, and the purpose of adding the timeout information is the same as before, easier to use. This way, I don't have to write additional scripts to get all kinds of data. And as far as I know, the io_uring_show_fdinfo function is only called once when the user is viewing the /proc/xxx/fdinfo/x file once. I don't think we normally need to look at this file as often, and only look at it when the program is abnormal, and the timeout_list is very long in the extreme case, so I think the performance impact of adding this code is limited. --- Ruyi Zhang
On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <ruyi.zhang@samsung.com> wrote: > > --- > On 2024-10-10 15:35 Pavel Begunkov wrote: > >> Two questions: > >> > >> 1. I agree with you, we shouldn't walk a potentially very > >> long list under spinlock. but i can't find any other way > >> to get all the timeout > > > If only it's just under the spin, but with disabled irqs... > > >> information than to walk the timeout_list. Do you have any > >> good ideas? > > > In the long run it'd be great to replace the spinlock > > with a mutex, i.e. just ->uring_lock, but that would might be > > a bit involving as need to move handling to the task context. > > Yes, it makes more sense to replace spin_lock, but that would > require other related logic to be modified, and I don't think > it's wise to do that for the sake of a piece of debugging > information. > > >> 2. I also agree seq_printf heavier, if we use > >> seq_put_decimal_ull and seq_puts to concatenate strings, > >> I haven't tested whether it's more efficient or not, but > >> the code is certainly not as readable as the former. It's > >> also possible that I don't fully understand what you mean > >> and want to hear your opinion. > > > I don't think there is any difference, it'd be a matter of > > doubling the number of in flight timeouts to achieve same > > timings. Tell me, do you really have a good case where you > > need that (pretty verbose)? Why not drgn / bpftrace it out > > of the kernel instead? > > Of course, this information is available through existing tools. > But I think that most of the io_uring metadata has been exported > from the fdinfo file, and the purpose of adding the timeout > information is the same as before, easier to use. This way, > I don't have to write additional scripts to get all kinds of data. > > And as far as I know, the io_uring_show_fdinfo function is > only called once when the user is viewing the > /proc/xxx/fdinfo/x file once. I don't think we normally need to > look at this file as often, and only look at it when the program > is abnormal, and the timeout_list is very long in the extreme case, > so I think the performance impact of adding this code is limited. I do think it's useful, sometimes the only thing you have to poke at after-the-fact is the fdinfo information. At the same time, would it be more useful to dump _some_ of the info, even if we can't get all of it? Would not be too hard to just stop dumping if need_resched() is set, and even note that - you can always retry, as this info is generally grabbed from the console anyway, not programmatically. That avoids the worst possible scenario, which is a malicious setup with a shit ton of pending timers, while still allowing it to be useful for a normal setup. And this patch could just do that, rather than attempt to re-architect how the timers are tracked and which locking it uses. -- Jens Axboe
On 10/24/24 18:31, Jens Axboe wrote: > On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <ruyi.zhang@samsung.com> wrote: ... >>> I don't think there is any difference, it'd be a matter of >>> doubling the number of in flight timeouts to achieve same >>> timings. Tell me, do you really have a good case where you >>> need that (pretty verbose)? Why not drgn / bpftrace it out >>> of the kernel instead? >> >> Of course, this information is available through existing tools. >> But I think that most of the io_uring metadata has been exported >> from the fdinfo file, and the purpose of adding the timeout >> information is the same as before, easier to use. This way, >> I don't have to write additional scripts to get all kinds of data. >> >> And as far as I know, the io_uring_show_fdinfo function is >> only called once when the user is viewing the >> /proc/xxx/fdinfo/x file once. I don't think we normally need to >> look at this file as often, and only look at it when the program >> is abnormal, and the timeout_list is very long in the extreme case, >> so I think the performance impact of adding this code is limited. > > I do think it's useful, sometimes the only thing you have to poke at > after-the-fact is the fdinfo information. At the same time, would it be If you have an fd to print fdinfo, you can just well run drgn or any other debugging tool. We keep pushing more debugging code that can be extracted with bpf and other tools, and not only it bloats the code, but potentially cripples the entire kernel. > more useful to dump _some_ of the info, even if we can't get all of it? > Would not be too hard to just stop dumping if need_resched() is set, and need_resched() takes eternity in the eyes of hard irqs, that is surely one way to make the system unusable. Will we even get the request for rescheduling considering that irqs are off => timers can't run? > even note that - you can always retry, as this info is generally grabbed > from the console anyway, not programmatically. That avoids the worst > possible scenario, which is a malicious setup with a shit ton of pending > timers, while still allowing it to be useful for a normal setup. And > this patch could just do that, rather than attempt to re-architect how > the timers are tracked and which locking it uses. Or it can be done with one of the existing tools that already exist specifically for that purpose, which don't need any additional kernel and custom handling in the kernel, and users won't need to wait until the patch lands into your kernel and can be run right away. -- Pavel Begunkov
On 10/24/24 12:10 PM, Pavel Begunkov wrote: > On 10/24/24 18:31, Jens Axboe wrote: >> On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <ruyi.zhang@samsung.com> wrote: > ... >>>> I don't think there is any difference, it'd be a matter of >>>> doubling the number of in flight timeouts to achieve same >>>> timings. Tell me, do you really have a good case where you >>>> need that (pretty verbose)? Why not drgn / bpftrace it out >>>> of the kernel instead? >>> >>> Of course, this information is available through existing tools. >>> But I think that most of the io_uring metadata has been exported >>> from the fdinfo file, and the purpose of adding the timeout >>> information is the same as before, easier to use. This way, >>> I don't have to write additional scripts to get all kinds of data. >>> >>> And as far as I know, the io_uring_show_fdinfo function is >>> only called once when the user is viewing the >>> /proc/xxx/fdinfo/x file once. I don't think we normally need to >>> look at this file as often, and only look at it when the program >>> is abnormal, and the timeout_list is very long in the extreme case, >>> so I think the performance impact of adding this code is limited. >> >> I do think it's useful, sometimes the only thing you have to poke at >> after-the-fact is the fdinfo information. At the same time, would it be > > If you have an fd to print fdinfo, you can just well run drgn > or any other debugging tool. We keep pushing more debugging code > that can be extracted with bpf and other tools, and not only > it bloats the code, but potentially cripples the entire kernel. While that is certainly true, it's also a much harder barrier to entry. If you're already setup with eg drgn, then yeah fdinfo is useless as you can grab much more info out by just using drgn. I'm fine punting this to "needs more advanced debugging than fdinfo". It's just important we get closure on these patches, so they don't linger forever in no man's land. -- Jens Axboe
On 10/25/24 00:25, Jens Axboe wrote: > On 10/24/24 12:10 PM, Pavel Begunkov wrote: >> On 10/24/24 18:31, Jens Axboe wrote: >>> On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <ruyi.zhang@samsung.com> wrote: >> ... >>>>> I don't think there is any difference, it'd be a matter of >>>>> doubling the number of in flight timeouts to achieve same >>>>> timings. Tell me, do you really have a good case where you >>>>> need that (pretty verbose)? Why not drgn / bpftrace it out >>>>> of the kernel instead? >>>> >>>> Of course, this information is available through existing tools. >>>> But I think that most of the io_uring metadata has been exported >>>> from the fdinfo file, and the purpose of adding the timeout >>>> information is the same as before, easier to use. This way, >>>> I don't have to write additional scripts to get all kinds of data. >>>> >>>> And as far as I know, the io_uring_show_fdinfo function is >>>> only called once when the user is viewing the >>>> /proc/xxx/fdinfo/x file once. I don't think we normally need to >>>> look at this file as often, and only look at it when the program >>>> is abnormal, and the timeout_list is very long in the extreme case, >>>> so I think the performance impact of adding this code is limited. >>> >>> I do think it's useful, sometimes the only thing you have to poke at >>> after-the-fact is the fdinfo information. At the same time, would it be >> >> If you have an fd to print fdinfo, you can just well run drgn >> or any other debugging tool. We keep pushing more debugging code >> that can be extracted with bpf and other tools, and not only >> it bloats the code, but potentially cripples the entire kernel. > > While that is certainly true, it's also a much harder barrier to entry. > If you're already setup with eg drgn, then yeah fdinfo is useless as you > can grab much more info out by just using drgn. drgn is simple, not that harder than patching fdinfo, we can add liburing/scripts, and push it there so that don't need rewriting it each time. > I'm fine punting this to "needs more advanced debugging than fdinfo". > It's just important we get closure on these patches, so they don't > linger forever in no man's land. The only option I see is to dump first ~5 and stop there, but I still think the tooling option is better. -- Pavel Begunkov
On 10/29/24 7:29 PM, Pavel Begunkov wrote: > On 10/25/24 00:25, Jens Axboe wrote: >> On 10/24/24 12:10 PM, Pavel Begunkov wrote: >>> On 10/24/24 18:31, Jens Axboe wrote: >>>> On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <ruyi.zhang@samsung.com> wrote: >>> ... >>>>>> I don't think there is any difference, it'd be a matter of >>>>>> doubling the number of in flight timeouts to achieve same >>>>>> timings. Tell me, do you really have a good case where you >>>>>> need that (pretty verbose)? Why not drgn / bpftrace it out >>>>>> of the kernel instead? >>>>> >>>>> Of course, this information is available through existing tools. >>>>> But I think that most of the io_uring metadata has been exported >>>>> from the fdinfo file, and the purpose of adding the timeout >>>>> information is the same as before, easier to use. This way, >>>>> I don't have to write additional scripts to get all kinds of data. >>>>> >>>>> And as far as I know, the io_uring_show_fdinfo function is >>>>> only called once when the user is viewing the >>>>> /proc/xxx/fdinfo/x file once. I don't think we normally need to >>>>> look at this file as often, and only look at it when the program >>>>> is abnormal, and the timeout_list is very long in the extreme case, >>>>> so I think the performance impact of adding this code is limited. >>>> >>>> I do think it's useful, sometimes the only thing you have to poke at >>>> after-the-fact is the fdinfo information. At the same time, would it be >>> >>> If you have an fd to print fdinfo, you can just well run drgn >>> or any other debugging tool. We keep pushing more debugging code >>> that can be extracted with bpf and other tools, and not only >>> it bloats the code, but potentially cripples the entire kernel. >> >> While that is certainly true, it's also a much harder barrier to entry. >> If you're already setup with eg drgn, then yeah fdinfo is useless as you >> can grab much more info out by just using drgn. > > drgn is simple, not that harder than patching fdinfo, we can add > liburing/scripts, and push it there so that don't need rewriting > it each time. It's not that drgn it's hard to use, it's not, but that people aren't necessarily aware of it. Once you've used it, yeah it's trivial. But for the cases where you are stuck in prod and you haven't used anything like that, it's a bit of a stretch to get there. Once it's part of your usual arsenal of tools, not an issue at all. Adding something to liburing/scripts/ would indeed be awesome. >> I'm fine punting this to "needs more advanced debugging than fdinfo". >> It's just important we get closure on these patches, so they don't >> linger forever in no man's land. > > The only option I see is to dump first ~5 and stop there, but > I still think the tooling option is better. Let's just not do it at all, I think a partial dump is likely to be potentially useless. And you can't cat it again and expect something different if things are stuck. -- Jens Axboe
© 2016 - 2024 Red Hat, Inc.