arch/riscv/Kconfig | 4 +- arch/riscv/include/asm/ftrace.h | 11 +++ arch/riscv/include/asm/processor.h | 5 ++ arch/riscv/include/asm/vector.h | 22 ++++- arch/riscv/kernel/asm-offsets.c | 7 ++ arch/riscv/kernel/ftrace.c | 133 ++++++++++++----------------- arch/riscv/kernel/mcount-dyn.S | 25 ++++-- arch/riscv/mm/cacheflush.c | 15 +++- 8 files changed, 135 insertions(+), 87 deletions(-)
This series makes atmoic code patching possible in riscv ftrace. A direct benefit of this is that we can get rid of stop_machine() when patching function entries. This also makes it possible to run ftrace with full kernel preemption. Before this series, the kernel initializes patchable function entries to NOP4 + NOP4. To start tracing, it updates entries to AUIPC + JALR while holding other cores in stop_machine. stop_machine() is required because it is impossible to update 2 instructions, and be seen atomically. And preemption must have to be prevented, as kernel preemption allows process to be scheduled out while executing on one of these instruction pairs. This series addresses the problem by initializing the first NOP4 to AUIPC. So, atmoic patching is possible because the kernel only has to update one instruction. As long as the instruction is naturally aligned, then it is expected to be updated atomically. However, the address range of the ftrace trampoline is limited to +-2K from ftrace_caller after appplying this series. This issue is expected to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align data in front of pacthable functions and can use it to direct execution out to any custom trampolines. The series is composed by three parts. The first part cleans up the existing issues when the kernel is compiled with clang.The second part modifies the ftrace code patching mechanism (2-4) as mentioned above. Then prepare ftrace to be able to run with kernel preemption (5,6) An ongoing fix: Since there is no room for marking *kernel_text_address as notrace[1] at source code level, there is a significant performance regression when using function_graph with TRACE_IRQFLAGS enabled. There can be as much as 8 graph handler being called in each function-entry. The current workaround requires us echo "*kernel_text_address" into set_ftrace_notrace before starting the trace. However, we observed that the kernel still enables the patch site in some cases even with *kernel_text_address properly added in the file While the root cause is still under investagtion, we consider that it should not be the reason for holding back the code patching, in order to unblock the call_ops part. [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/ Changes in v3: - Add a fix tag for patch 1 - Add a data fence before sending out remote fence.i (6) - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ Changes in v2: - Drop patch 1 as it is merged through fixes. - Drop patch 2, which converts kernel_text_address into notrace. As users can prevent tracing it by configuring the tracefs. - Use a more generic way in kconfig to align functions. - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com Andy Chiu (7): riscv: ftrace: support fastcc in Clang for WITH_ARGS riscv: ftrace: align patchable functions to 4 Byte boundary riscv: ftrace: prepare ftrace for atomic code patching riscv: ftrace: do not use stop_machine to update code riscv: vector: Support calling schedule() for preemptible Vector riscv: add a data fence for CMODX in the kernel mode riscv: ftrace: support PREEMPT arch/riscv/Kconfig | 4 +- arch/riscv/include/asm/ftrace.h | 11 +++ arch/riscv/include/asm/processor.h | 5 ++ arch/riscv/include/asm/vector.h | 22 ++++- arch/riscv/kernel/asm-offsets.c | 7 ++ arch/riscv/kernel/ftrace.c | 133 ++++++++++++----------------- arch/riscv/kernel/mcount-dyn.S | 25 ++++-- arch/riscv/mm/cacheflush.c | 15 +++- 8 files changed, 135 insertions(+), 87 deletions(-) --- base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1 -- 2.39.3 (Apple Git-145)
Adding Steven. Andy Chiu <andybnac@gmail.com> writes: > This series makes atmoic code patching possible in riscv ftrace. A > direct benefit of this is that we can get rid of stop_machine() when > patching function entries. This also makes it possible to run ftrace > with full kernel preemption. Before this series, the kernel initializes > patchable function entries to NOP4 + NOP4. To start tracing, it updates > entries to AUIPC + JALR while holding other cores in stop_machine. > stop_machine() is required because it is impossible to update 2 > instructions, and be seen atomically. And preemption must have to be > prevented, as kernel preemption allows process to be scheduled out while > executing on one of these instruction pairs. > > This series addresses the problem by initializing the first NOP4 to > AUIPC. So, atmoic patching is possible because the kernel only has to > update one instruction. As long as the instruction is naturally aligned, > then it is expected to be updated atomically. > > However, the address range of the ftrace trampoline is limited to +-2K > from ftrace_caller after appplying this series. This issue is expected > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align > data in front of pacthable functions and can use it to direct execution > out to any custom trampolines. > > The series is composed by three parts. The first part cleans up the > existing issues when the kernel is compiled with clang.The second part > modifies the ftrace code patching mechanism (2-4) as mentioned above. > Then prepare ftrace to be able to run with kernel preemption (5,6) > > An ongoing fix: > > Since there is no room for marking *kernel_text_address as notrace[1] at > source code level, there is a significant performance regression when > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as > 8 graph handler being called in each function-entry. The current > workaround requires us echo "*kernel_text_address" into > set_ftrace_notrace before starting the trace. However, we observed that > the kernel still enables the patch site in some cases even with > *kernel_text_address properly added in the file While the root cause is > still under investagtion, we consider that it should not be the reason > for holding back the code patching, in order to unblock the call_ops > part. Maybe Steven knows this from the top of his head! As Andy points out, "*kernel_text_address" is used in the stack unwinding on RISC-V. So, if you do a tracing without filtering *and* TRACE_IRQFLAGS, one will drown in traces. E.g. the ftrace selftest: | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc will generate a lot of traces. Now, if we add: --8<-- diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc index ff88f97e41fb..4f30a4d81d99 100644 --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc @@ -84,6 +84,7 @@ cd $INSTANCE2 do_test '*rcu*' 'rcu' cd $WD cd $INSTANCE3 +echo '*kernel_text_address' > set_ftrace_notrace echo function_graph > current_tracer sleep 1 -->8-- The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3, but still enable the *kernel_text_address traces. (Note that there are no filters in the test, so *all* ftrace recs will be enabled.) Are we holding the graph tracer wrong? Happy thanksgiving! Björn
On Wed, 27 Nov 2024 22:25:57 +0100 Björn Töpel <bjorn@kernel.org> wrote: > Adding Steven. And this has been in my draft folder for almost a month :-p I kept coming to this email, but then got distracted by something else. > > Andy Chiu <andybnac@gmail.com> writes: > > > This series makes atmoic code patching possible in riscv ftrace. A > > direct benefit of this is that we can get rid of stop_machine() when > > patching function entries. This also makes it possible to run ftrace > > with full kernel preemption. Before this series, the kernel initializes > > patchable function entries to NOP4 + NOP4. To start tracing, it updates > > entries to AUIPC + JALR while holding other cores in stop_machine. > > stop_machine() is required because it is impossible to update 2 > > instructions, and be seen atomically. And preemption must have to be > > prevented, as kernel preemption allows process to be scheduled out while > > executing on one of these instruction pairs. > > > > This series addresses the problem by initializing the first NOP4 to > > AUIPC. So, atmoic patching is possible because the kernel only has to > > update one instruction. As long as the instruction is naturally aligned, > > then it is expected to be updated atomically. > > > > However, the address range of the ftrace trampoline is limited to +-2K > > from ftrace_caller after appplying this series. This issue is expected > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align > > data in front of pacthable functions and can use it to direct execution > > out to any custom trampolines. > > > > The series is composed by three parts. The first part cleans up the > > existing issues when the kernel is compiled with clang.The second part > > modifies the ftrace code patching mechanism (2-4) as mentioned above. > > Then prepare ftrace to be able to run with kernel preemption (5,6) > > > > An ongoing fix: > > > > Since there is no room for marking *kernel_text_address as notrace[1] at > > source code level, there is a significant performance regression when > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as > > 8 graph handler being called in each function-entry. The current > > workaround requires us echo "*kernel_text_address" into > > set_ftrace_notrace before starting the trace. However, we observed that > > the kernel still enables the patch site in some cases even with > > *kernel_text_address properly added in the file While the root cause is > > still under investagtion, we consider that it should not be the reason > > for holding back the code patching, in order to unblock the call_ops > > part. > > Maybe Steven knows this from the top of his head! > > As Andy points out, "*kernel_text_address" is used in the stack > unwinding on RISC-V. So, if you do a tracing without filtering *and* > TRACE_IRQFLAGS, one will drown in traces. I tested set_ftrace_notrace on x86 and the function graph tracer does honor it. I wonder if there's a kernel > > E.g. the ftrace selftest: > | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc > > will generate a lot of traces. > > Now, if we add: > --8<-- > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc > index ff88f97e41fb..4f30a4d81d99 100644 > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc > @@ -84,6 +84,7 @@ cd $INSTANCE2 > do_test '*rcu*' 'rcu' > cd $WD > cd $INSTANCE3 > +echo '*kernel_text_address' > set_ftrace_notrace > echo function_graph > current_tracer > > sleep 1 > -->8-- > > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3, > but still enable the *kernel_text_address traces. (Note that there are > no filters in the test, so *all* ftrace recs will be enabled.) > > Are we holding the graph tracer wrong? What do you get when you do: # grep kernel_text_address available_filter_functions ? > > > Happy thanksgiving! > Björn And Merry Christmas! -- Steve
Steven Rostedt <rostedt@goodmis.org> 於 2024年12月24日 週二 上午11:15寫道: > > On Wed, 27 Nov 2024 22:25:57 +0100 > Björn Töpel <bjorn@kernel.org> wrote: > > > Adding Steven. > > And this has been in my draft folder for almost a month :-p > > I kept coming to this email, but then got distracted by something else. > > > > > Andy Chiu <andybnac@gmail.com> writes: > > > > > This series makes atmoic code patching possible in riscv ftrace. A > > > direct benefit of this is that we can get rid of stop_machine() when > > > patching function entries. This also makes it possible to run ftrace > > > with full kernel preemption. Before this series, the kernel initializes > > > patchable function entries to NOP4 + NOP4. To start tracing, it updates > > > entries to AUIPC + JALR while holding other cores in stop_machine. > > > stop_machine() is required because it is impossible to update 2 > > > instructions, and be seen atomically. And preemption must have to be > > > prevented, as kernel preemption allows process to be scheduled out while > > > executing on one of these instruction pairs. > > > > > > This series addresses the problem by initializing the first NOP4 to > > > AUIPC. So, atmoic patching is possible because the kernel only has to > > > update one instruction. As long as the instruction is naturally aligned, > > > then it is expected to be updated atomically. > > > > > > However, the address range of the ftrace trampoline is limited to +-2K > > > from ftrace_caller after appplying this series. This issue is expected > > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align > > > data in front of pacthable functions and can use it to direct execution > > > out to any custom trampolines. > > > > > > The series is composed by three parts. The first part cleans up the > > > existing issues when the kernel is compiled with clang.The second part > > > modifies the ftrace code patching mechanism (2-4) as mentioned above. > > > Then prepare ftrace to be able to run with kernel preemption (5,6) > > > > > > An ongoing fix: > > > > > > Since there is no room for marking *kernel_text_address as notrace[1] at > > > source code level, there is a significant performance regression when > > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as > > > 8 graph handler being called in each function-entry. The current > > > workaround requires us echo "*kernel_text_address" into > > > set_ftrace_notrace before starting the trace. However, we observed that > > > the kernel still enables the patch site in some cases even with > > > *kernel_text_address properly added in the file While the root cause is > > > still under investagtion, we consider that it should not be the reason > > > for holding back the code patching, in order to unblock the call_ops > > > part. > > > > Maybe Steven knows this from the top of his head! > > > > As Andy points out, "*kernel_text_address" is used in the stack > > unwinding on RISC-V. So, if you do a tracing without filtering *and* > > TRACE_IRQFLAGS, one will drown in traces. > > I tested set_ftrace_notrace on x86 and the function graph tracer does honor > it. I wonder if there's a kernel After checking the log buffer, I can confirm that riscv does also honor set_ftrace_notrace. It does not print out "*kernel_text_address" in the trace buffer. Sorry for not making this clear. However, the problem is that the patch sites "*kernel_text_address" were enabled, allowing the code flow into ftrace and parts of fgraph. In particular, with TRACE_IRQFLAGS enabled, the functions are called extensively, causing the significant slow down. IIUC, it is reasonable to enable a patch site if there is at least one tracing instance that request the function. However, in another experiment, the patch sites of "*kernel_text_address" are enabled even when all instances have them disabled. Here is a way to reproduce it: cd /sys/kernel/ mount -t tracefs t tracing/ cd tracing/ mkdir instances/1 mkdir instances/2 cd instances/1/ echo *kernel_text_address > set_ftrace_notrace echo *sched* > set_ftrace_filter # this is just to make the kernel more responsive cat set_ftrace_notrace echo function_graph > current_tracer cd ../2/ echo *kernel_text_address > set_ftrace_notrace cat set_ftrace_notrace echo function_graph > current_tracer To figure out this problem, it is probably worth trying to understand why the patch site is enabled after the above sequence: It seems like the ftrace_ops being passed into __ftrace_hash_rec_update() at the start of the second tracer has an empty notrace hash. This leads to the eventual activation of the patch sites. However, graph_ops does not have the empty notrace hash when the user starts the first graph tracer. I am trying to understand this part of the code, and the relationship between graph_ops and ops in the fgraph_array. > > > > > E.g. the ftrace selftest: > > | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc > > > > will generate a lot of traces. > > > > Now, if we add: > > --8<-- > > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc > > index ff88f97e41fb..4f30a4d81d99 100644 > > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc > > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc > > @@ -84,6 +84,7 @@ cd $INSTANCE2 > > do_test '*rcu*' 'rcu' > > cd $WD > > cd $INSTANCE3 > > +echo '*kernel_text_address' > set_ftrace_notrace > > echo function_graph > current_tracer > > > > sleep 1 > > -->8-- > > > > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3, > > but still enable the *kernel_text_address traces. (Note that there are > > no filters in the test, so *all* ftrace recs will be enabled.) > > > > Are we holding the graph tracer wrong? > > What do you get when you do: > > # grep kernel_text_address available_filter_functions > > ? We get: kernel_text_address __kernel_text_address > > > > > > > Happy thanksgiving! > > Björn > > And Merry Christmas! > > -- Steve > And Happy New Year! Andy
Andy Chiu <andybnac@gmail.com> 於 2024年12月30日 週一 上午3:08寫道:
>
> Steven Rostedt <rostedt@goodmis.org> 於 2024年12月24日 週二 上午11:15寫道:
> >
> > On Wed, 27 Nov 2024 22:25:57 +0100
> > Björn Töpel <bjorn@kernel.org> wrote:
> >
> > > Adding Steven.
> >
> > And this has been in my draft folder for almost a month :-p
> >
> > I kept coming to this email, but then got distracted by something else.
> >
> > >
> > > Andy Chiu <andybnac@gmail.com> writes:
> > >
> > > > This series makes atmoic code patching possible in riscv ftrace. A
> > > > direct benefit of this is that we can get rid of stop_machine() when
> > > > patching function entries. This also makes it possible to run ftrace
> > > > with full kernel preemption. Before this series, the kernel initializes
> > > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > > stop_machine() is required because it is impossible to update 2
> > > > instructions, and be seen atomically. And preemption must have to be
> > > > prevented, as kernel preemption allows process to be scheduled out while
> > > > executing on one of these instruction pairs.
> > > >
> > > > This series addresses the problem by initializing the first NOP4 to
> > > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > > update one instruction. As long as the instruction is naturally aligned,
> > > > then it is expected to be updated atomically.
> > > >
> > > > However, the address range of the ftrace trampoline is limited to +-2K
> > > > from ftrace_caller after appplying this series. This issue is expected
> > > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > > data in front of pacthable functions and can use it to direct execution
> > > > out to any custom trampolines.
> > > >
> > > > The series is composed by three parts. The first part cleans up the
> > > > existing issues when the kernel is compiled with clang.The second part
> > > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > > >
> > > > An ongoing fix:
> > > >
> > > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > > source code level, there is a significant performance regression when
> > > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > > 8 graph handler being called in each function-entry. The current
> > > > workaround requires us echo "*kernel_text_address" into
> > > > set_ftrace_notrace before starting the trace. However, we observed that
> > > > the kernel still enables the patch site in some cases even with
> > > > *kernel_text_address properly added in the file While the root cause is
> > > > still under investagtion, we consider that it should not be the reason
> > > > for holding back the code patching, in order to unblock the call_ops
> > > > part.
> > >
> > > Maybe Steven knows this from the top of his head!
> > >
> > > As Andy points out, "*kernel_text_address" is used in the stack
> > > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > > TRACE_IRQFLAGS, one will drown in traces.
> >
> > I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> > it. I wonder if there's a kernel
>
> After checking the log buffer, I can confirm that riscv does also
> honor set_ftrace_notrace. It does not print out "*kernel_text_address"
> in the trace buffer. Sorry for not making this clear.
>
> However, the problem is that the patch sites "*kernel_text_address"
> were enabled, allowing the code flow into ftrace and parts of fgraph.
> In particular, with TRACE_IRQFLAGS enabled, the functions are called
> extensively, causing the significant slow down.
>
> IIUC, it is reasonable to enable a patch site if there is at least one
> tracing instance that request the function. However, in another
> experiment, the patch sites of "*kernel_text_address" are enabled
> even when all instances have them disabled. Here is a way to reproduce
> it:
>
> cd /sys/kernel/
> mount -t tracefs t tracing/
> cd tracing/
> mkdir instances/1
> mkdir instances/2
> cd instances/1/
> echo *kernel_text_address > set_ftrace_notrace
> echo *sched* > set_ftrace_filter # this is just to make the kernel
> more responsive
> cat set_ftrace_notrace
> echo function_graph > current_tracer
> cd ../2/
> echo *kernel_text_address > set_ftrace_notrace
> cat set_ftrace_notrace
> echo function_graph > current_tracer
>
> To figure out this problem, it is probably worth trying to understand
> why the patch site is enabled after the above sequence:
>
> It seems like the ftrace_ops being passed into
> __ftrace_hash_rec_update() at the start of the second tracer has an
> empty notrace hash. This leads to the eventual activation of the patch
> sites. However, graph_ops does not have the empty notrace hash when
> the user starts the first graph tracer. I am trying to understand this
> part of the code, and the relationship between graph_ops and ops in
> the fgraph_array.
Here might be the possible reason: the updated notrace hash for
Manager's ops is not reflecting the intersections of all users'
notrace hashes. According to the comment section at
ftrace_startup_subops():
* o If either notrace_hash is empty then the final stays empty
* o Otherwise, the final is an intersection between the hashes
It seems like wrong arguments were passed into the intersect_hash().
Instead of user's notrace hash, the filter hash was passed into the
function. As a result, even if the function is marked as notrace in
all ftrace instances, the notrace hash in Manager's ops would not
contain notrace function, leading to enabled call site at the
notrace'd function. If I understand correctly, we should pass notrace
hash instead of filter hash, as shown in the following modification.
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4c28dd177ca6..3a194559d220 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3515,18 +3515,17 @@ int ftrace_startup_subops(struct ftrace_ops
*ops, struct ftrace_ops *subops, int
ftrace_hash_empty(subops->func_hash->notrace_hash)) {
notrace_hash = EMPTY_HASH;
} else {
- size_bits = max(ops->func_hash->filter_hash->size_bits,
- subops->func_hash->filter_hash->size_bits);
+ size_bits = max(ops->func_hash->notrace_hash->size_bits,
+ subops->func_hash->notrace_hash->size_bits);
notrace_hash = alloc_ftrace_hash(size_bits);
if (!notrace_hash) {
- free_ftrace_hash(filter_hash);
+ free_ftrace_hash(notrace_hash);
return -ENOMEM;
}
- ret = intersect_hash(¬race_hash, ops->func_hash->filter_hash,
- subops->func_hash->filter_hash);
+ ret = intersect_hash(¬race_hash,
ops->func_hash->notrace_hash,
+ subops->func_hash->notrace_hash);
if (ret < 0) {
- free_ftrace_hash(filter_hash);
free_ftrace_hash(notrace_hash);
return ret;
}
I can confirm that callsites at *kernel_text_address are no longer
patched after applying the above patch, if they are set as notrace in
each ftrace instance. I can send out a fix patch if this sounds like a
proper fix.
>
> >
> > >
> > > E.g. the ftrace selftest:
> > > | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> > >
> > > will generate a lot of traces.
> > >
> > > Now, if we add:
> > > --8<--
> > > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > index ff88f97e41fb..4f30a4d81d99 100644
> > > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > @@ -84,6 +84,7 @@ cd $INSTANCE2
> > > do_test '*rcu*' 'rcu'
> > > cd $WD
> > > cd $INSTANCE3
> > > +echo '*kernel_text_address' > set_ftrace_notrace
> > > echo function_graph > current_tracer
> > >
> > > sleep 1
> > > -->8--
> > >
> > > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > > but still enable the *kernel_text_address traces. (Note that there are
> > > no filters in the test, so *all* ftrace recs will be enabled.)
> > >
> > > Are we holding the graph tracer wrong?
> >
> > What do you get when you do:
> >
> > # grep kernel_text_address available_filter_functions
> >
> > ?
>
> We get:
> kernel_text_address
> __kernel_text_address
>
> >
> > >
> > >
> > > Happy thanksgiving!
> > > Björn
> >
> > And Merry Christmas!
> >
> > -- Steve
> >
>
> And Happy New Year!
>
>
> Andy
Thanks,
Andy
Andy! "atomic" spelling in the Subject line. Andy Chiu <andybnac@gmail.com> writes: > Changes in v3: > - Add a fix tag for patch 1 > - Add a data fence before sending out remote fence.i (6) > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ Hmm, the fixes tag was not included. Also, there was a lot of comments from v2 that was not addressed: * Minor spelling nits * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the same series for DIRECT_CALL, to avoid breakage. I'll have a look at the barriers (which came up at plumbers)! Cheers, Björn
Björn Töpel <bjorn@kernel.org> 於 2024年12月3日 週二 下午8:18寫道: > > Andy! > > "atomic" spelling in the Subject line. Sorry, I will fix it > > Andy Chiu <andybnac@gmail.com> writes: > > > Changes in v3: > > - Add a fix tag for patch 1 > > - Add a data fence before sending out remote fence.i (6) > > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ > > Hmm, the fixes tag was not included. Do you suggest adding fix tag to the entire series? Or is there any patches that is missing the fix tag? I am not sure if this is a fix since we defeatured PREEMPT 2 years ago. > > Also, there was a lot of comments from v2 that was not addressed: > > * Minor spelling nits > * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the > same series for DIRECT_CALL, to avoid breakage. Sorry I didn't get it at the Plumbers. Yes, I can test and merge Puranjay's series and send a v4. > > I'll have a look at the barriers (which came up at plumbers)! > > > Cheers, > Björn Thanks, Andy
On 03.12.2024 15:18, Björn Töpel wrote: > Andy! > > "atomic" spelling in the Subject line. > > Andy Chiu <andybnac@gmail.com> writes: > >> Changes in v3: >> - Add a fix tag for patch 1 >> - Add a data fence before sending out remote fence.i (6) >> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ > > Hmm, the fixes tag was not included. > > Also, there was a lot of comments from v2 that was not addressed: > > * Minor spelling nits > * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the > same series for DIRECT_CALL, to avoid breakage. Yes, FTRACE_WITH_DIRECT_CALLS is currently broken. If I try to insmod samples/ftrace/ftrace-direct.ko, it reports a failure: [ 179.531472] ------------[ ftrace bug ]------------ [ 179.531761] ftrace failed to modify [ 179.531786] [<ffffffff8005f9ac>] wake_up_process+0x0/0x24 [ 179.532577] actual: 97:e2:fa:ff:13:00:00:00 [ 179.533211] Setting ftrace call site to call ftrace function [ 179.534409] ftrace record flags: 99980001 [ 179.534692] (1) tramp: ftrace_caller+0x0/0x34 (call_direct_funcs+0x0/0x14) [ 179.534692] expected tramp: ffffffff01b0d000 ... > > I'll have a look at the barriers (which came up at plumbers)! Thank you! After this series and the CALL_OPS work are done, dynamic Ftrace for RISC-V will be even more valuable in production use cases. > > > Cheers, > Björn > Regards, Evgenii
Evgenii Shatokhin <e.shatokhin@yadro.com> writes: > On 03.12.2024 15:18, Björn Töpel wrote: >> Andy! >> >> "atomic" spelling in the Subject line. >> >> Andy Chiu <andybnac@gmail.com> writes: >> >>> Changes in v3: >>> - Add a fix tag for patch 1 >>> - Add a data fence before sending out remote fence.i (6) >>> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ >> >> Hmm, the fixes tag was not included. >> >> Also, there was a lot of comments from v2 that was not addressed: >> >> * Minor spelling nits >> * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the >> same series for DIRECT_CALL, to avoid breakage. > > Yes, FTRACE_WITH_DIRECT_CALLS is currently broken. If I try to insmod > samples/ftrace/ftrace-direct.ko, it reports a failure: > > > [ 179.531472] ------------[ ftrace bug ]------------ > [ 179.531761] ftrace failed to modify > [ 179.531786] [<ffffffff8005f9ac>] wake_up_process+0x0/0x24 > [ 179.532577] actual: 97:e2:fa:ff:13:00:00:00 > [ 179.533211] Setting ftrace call site to call ftrace function > [ 179.534409] ftrace record flags: 99980001 > [ 179.534692] (1) tramp: ftrace_caller+0x0/0x34 > (call_direct_funcs+0x0/0x14) > [ 179.534692] expected tramp: ffffffff01b0d000 > ... And just a regular Ubuntu 24.10 will fail with all subsystems using BPF trampoline, e.g. ------------[ ftrace bug ]------------ ftrace failed to modify [<ffffffff80250d98>] bpf_lsm_file_open+0x0/0x1c CALL_OPS with definitely a must for this series. Björn
Hi, On 27.11.2024 20:29, Andy Chiu wrote: > This series makes atmoic code patching possible in riscv ftrace. A > direct benefit of this is that we can get rid of stop_machine() when > patching function entries. This also makes it possible to run ftrace > with full kernel preemption. Before this series, the kernel initializes > patchable function entries to NOP4 + NOP4. To start tracing, it updates > entries to AUIPC + JALR while holding other cores in stop_machine. > stop_machine() is required because it is impossible to update 2 > instructions, and be seen atomically. And preemption must have to be > prevented, as kernel preemption allows process to be scheduled out while > executing on one of these instruction pairs. > > This series addresses the problem by initializing the first NOP4 to > AUIPC. So, atmoic patching is possible because the kernel only has to > update one instruction. As long as the instruction is naturally aligned, > then it is expected to be updated atomically. > > However, the address range of the ftrace trampoline is limited to +-2K > from ftrace_caller after appplying this series. This issue is expected > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align > data in front of pacthable functions and can use it to direct execution > out to any custom trampolines. > > The series is composed by three parts. The first part cleans up the > existing issues when the kernel is compiled with clang.The second part > modifies the ftrace code patching mechanism (2-4) as mentioned above. > Then prepare ftrace to be able to run with kernel preemption (5,6) > > An ongoing fix: > > Since there is no room for marking *kernel_text_address as notrace[1] at > source code level, there is a significant performance regression when > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as > 8 graph handler being called in each function-entry. The current > workaround requires us echo "*kernel_text_address" into > set_ftrace_notrace before starting the trace. However, we observed that > the kernel still enables the patch site in some cases even with > *kernel_text_address properly added in the file While the root cause is > still under investagtion, we consider that it should not be the reason > for holding back the code patching, in order to unblock the call_ops > part. > > [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/ > > Changes in v3: > - Add a fix tag for patch 1 > - Add a data fence before sending out remote fence.i (6) > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ > > Changes in v2: > - Drop patch 1 as it is merged through fixes. > - Drop patch 2, which converts kernel_text_address into notrace. As > users can prevent tracing it by configuring the tracefs. > - Use a more generic way in kconfig to align functions. > - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com > > > Andy Chiu (7): > riscv: ftrace: support fastcc in Clang for WITH_ARGS > riscv: ftrace: align patchable functions to 4 Byte boundary > riscv: ftrace: prepare ftrace for atomic code patching > riscv: ftrace: do not use stop_machine to update code > riscv: vector: Support calling schedule() for preemptible Vector > riscv: add a data fence for CMODX in the kernel mode > riscv: ftrace: support PREEMPT > > arch/riscv/Kconfig | 4 +- > arch/riscv/include/asm/ftrace.h | 11 +++ > arch/riscv/include/asm/processor.h | 5 ++ > arch/riscv/include/asm/vector.h | 22 ++++- > arch/riscv/kernel/asm-offsets.c | 7 ++ > arch/riscv/kernel/ftrace.c | 133 ++++++++++++----------------- > arch/riscv/kernel/mcount-dyn.S | 25 ++++-- > arch/riscv/mm/cacheflush.c | 15 +++- > 8 files changed, 135 insertions(+), 87 deletions(-) > --- > base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1 > -- > 2.39.3 (Apple Git-145) I have tested this series in a QEMU VM (-machine virt) with the preemptible kernels, CONFIG_PREEMPT=y. No issues have been revealed so far. One of the kernels was built by GCC 13.2.0 (with the patch for minimum alignment added on top of it), the other - with LLVM 17.0.6. In both cases, the basic boottime tests for Ftrace passed. Switching tracers between nop, function, function_graph and blk in a loop in parallel with stress-ng and with network load for several hours did not reveal any problems either. Kernel crashes happened in this scenario a year ago, but now it runs clean, good! Function redirection via Ftrace seems to work OK too. The size of .text section increased slightly after this series, by 0.35% - 0.38%, probably because of function alignment: * 12075 KB => 12121 KB (GCC) * 12167 KB => 12209 KB (LLVM/clang) Not sure, how to test the vector-related part though, "[PATCH v3 5/7] riscv: vector: Support calling schedule() for preemptible Vector" For all other patches in the series: Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Regards, Evgenii
Hi Evgenii, Evgenii Shatokhin <e.shatokhin@yadro.com> 於 2024年12月2日 週一 下午3:58寫道: > > Hi, > > On 27.11.2024 20:29, Andy Chiu wrote: > > This series makes atmoic code patching possible in riscv ftrace. A > > direct benefit of this is that we can get rid of stop_machine() when > > patching function entries. This also makes it possible to run ftrace > > with full kernel preemption. Before this series, the kernel initializes > > patchable function entries to NOP4 + NOP4. To start tracing, it updates > > entries to AUIPC + JALR while holding other cores in stop_machine. > > stop_machine() is required because it is impossible to update 2 > > instructions, and be seen atomically. And preemption must have to be > > prevented, as kernel preemption allows process to be scheduled out while > > executing on one of these instruction pairs. > > > > This series addresses the problem by initializing the first NOP4 to > > AUIPC. So, atmoic patching is possible because the kernel only has to > > update one instruction. As long as the instruction is naturally aligned, > > then it is expected to be updated atomically. > > > > However, the address range of the ftrace trampoline is limited to +-2K > > from ftrace_caller after appplying this series. This issue is expected > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align > > data in front of pacthable functions and can use it to direct execution > > out to any custom trampolines. > > > > The series is composed by three parts. The first part cleans up the > > existing issues when the kernel is compiled with clang.The second part > > modifies the ftrace code patching mechanism (2-4) as mentioned above. > > Then prepare ftrace to be able to run with kernel preemption (5,6) > > > > An ongoing fix: > > > > Since there is no room for marking *kernel_text_address as notrace[1] at > > source code level, there is a significant performance regression when > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as > > 8 graph handler being called in each function-entry. The current > > workaround requires us echo "*kernel_text_address" into > > set_ftrace_notrace before starting the trace. However, we observed that > > the kernel still enables the patch site in some cases even with > > *kernel_text_address properly added in the file While the root cause is > > still under investagtion, we consider that it should not be the reason > > for holding back the code patching, in order to unblock the call_ops > > part. > > > > [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/ > > > > Changes in v3: > > - Add a fix tag for patch 1 > > - Add a data fence before sending out remote fence.i (6) > > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/ > > > > Changes in v2: > > - Drop patch 1 as it is merged through fixes. > > - Drop patch 2, which converts kernel_text_address into notrace. As > > users can prevent tracing it by configuring the tracefs. > > - Use a more generic way in kconfig to align functions. > > - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com > > > > > > Andy Chiu (7): > > riscv: ftrace: support fastcc in Clang for WITH_ARGS > > riscv: ftrace: align patchable functions to 4 Byte boundary > > riscv: ftrace: prepare ftrace for atomic code patching > > riscv: ftrace: do not use stop_machine to update code > > riscv: vector: Support calling schedule() for preemptible Vector > > riscv: add a data fence for CMODX in the kernel mode > > riscv: ftrace: support PREEMPT > > > > arch/riscv/Kconfig | 4 +- > > arch/riscv/include/asm/ftrace.h | 11 +++ > > arch/riscv/include/asm/processor.h | 5 ++ > > arch/riscv/include/asm/vector.h | 22 ++++- > > arch/riscv/kernel/asm-offsets.c | 7 ++ > > arch/riscv/kernel/ftrace.c | 133 ++++++++++++----------------- > > arch/riscv/kernel/mcount-dyn.S | 25 ++++-- > > arch/riscv/mm/cacheflush.c | 15 +++- > > 8 files changed, 135 insertions(+), 87 deletions(-) > > --- > > base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1 > > -- > > 2.39.3 (Apple Git-145) > > I have tested this series in a QEMU VM (-machine virt) with the > preemptible kernels, CONFIG_PREEMPT=y. > > No issues have been revealed so far. > > One of the kernels was built by GCC 13.2.0 (with the patch for minimum > alignment added on top of it), the other - with LLVM 17.0.6. > > In both cases, the basic boottime tests for Ftrace passed. > > Switching tracers between nop, function, function_graph and blk in a > loop in parallel with stress-ng and with network load for several hours > did not reveal any problems either. Kernel crashes happened in this > scenario a year ago, but now it runs clean, good! > > Function redirection via Ftrace seems to work OK too. > > The size of .text section increased slightly after this series, by 0.35% > - 0.38%, probably because of function alignment: > * 12075 KB => 12121 KB (GCC) > * 12167 KB => 12209 KB (LLVM/clang) > > Not sure, how to test the vector-related part though, "[PATCH v3 5/7] > riscv: vector: Support calling schedule() for preemptible Vector" This should be tested as long as both the kernel is compiled with RISCV_ISA_V and PREEMPT, and the hardware supports v. And thanks for the extensive testing > > For all other patches in the series: > Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com> > > Regards, > Evgenii
© 2016 - 2025 Red Hat, Inc.