[PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements

Andy Chiu posted 7 patches 1 year ago
arch/riscv/Kconfig                 |   4 +-
arch/riscv/include/asm/ftrace.h    |  11 +++
arch/riscv/include/asm/processor.h |   5 ++
arch/riscv/include/asm/vector.h    |  22 ++++-
arch/riscv/kernel/asm-offsets.c    |   7 ++
arch/riscv/kernel/ftrace.c         | 133 ++++++++++++-----------------
arch/riscv/kernel/mcount-dyn.S     |  25 ++++--
arch/riscv/mm/cacheflush.c         |  15 +++-
8 files changed, 135 insertions(+), 87 deletions(-)
[PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Andy Chiu 1 year ago
This series makes atmoic code patching possible in riscv ftrace. A
direct benefit of this is that we can get rid of stop_machine() when
patching function entries. This also makes it possible to run ftrace
with full kernel preemption. Before this series, the kernel initializes
patchable function entries to NOP4 + NOP4. To start tracing, it updates
entries to AUIPC + JALR while holding other cores in stop_machine.
stop_machine() is required because it is impossible to update 2
instructions, and be seen atomically. And preemption must have to be
prevented, as kernel preemption allows process to be scheduled out while
executing on one of these instruction pairs.

This series addresses the problem by initializing the first NOP4 to
AUIPC. So, atmoic patching is possible because the kernel only has to
update one instruction. As long as the instruction is naturally aligned,
then it is expected to be updated atomically.

However, the address range of the ftrace trampoline is limited to +-2K
from ftrace_caller after appplying this series. This issue is expected
to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
data in front of pacthable functions and can  use it to direct execution
out to any custom trampolines.

The series is composed by three parts. The first part cleans up the
existing issues when the kernel is compiled with clang.The second part
modifies the ftrace code patching mechanism (2-4) as mentioned above.
Then prepare ftrace to be able to run with kernel preemption (5,6)

An ongoing fix:

Since there is no room for marking *kernel_text_address as notrace[1] at
source code level, there is a significant performance regression when
using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
8 graph handler being called in each function-entry. The current
workaround requires us echo "*kernel_text_address" into
set_ftrace_notrace before starting the trace. However, we observed that
the kernel still enables the patch site in some cases even with
*kernel_text_address properly added in the file While the root cause is
still under investagtion, we consider that it should not be the reason
for holding back the code patching, in order to unblock the call_ops
part.

[1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/

Changes in v3:
- Add a fix tag for patch 1
- Add a data fence before sending out remote fence.i (6)
- Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/

Changes in v2:
- Drop patch 1 as it is merged through fixes.
- Drop patch 2, which converts kernel_text_address into notrace. As
  users can prevent tracing it by configuring the tracefs.
- Use a more generic way in kconfig to align functions.
- Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com


Andy Chiu (7):
  riscv: ftrace: support fastcc in Clang for WITH_ARGS
  riscv: ftrace: align patchable functions to 4 Byte boundary
  riscv: ftrace: prepare ftrace for atomic code patching
  riscv: ftrace: do not use stop_machine to update code
  riscv: vector: Support calling schedule() for preemptible Vector
  riscv: add a data fence for CMODX in the kernel mode
  riscv: ftrace: support PREEMPT

 arch/riscv/Kconfig                 |   4 +-
 arch/riscv/include/asm/ftrace.h    |  11 +++
 arch/riscv/include/asm/processor.h |   5 ++
 arch/riscv/include/asm/vector.h    |  22 ++++-
 arch/riscv/kernel/asm-offsets.c    |   7 ++
 arch/riscv/kernel/ftrace.c         | 133 ++++++++++++-----------------
 arch/riscv/kernel/mcount-dyn.S     |  25 ++++--
 arch/riscv/mm/cacheflush.c         |  15 +++-
 8 files changed, 135 insertions(+), 87 deletions(-)
---
base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
-- 
2.39.3 (Apple Git-145)
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Björn Töpel 1 year ago
Adding Steven.

Andy Chiu <andybnac@gmail.com> writes:

> This series makes atmoic code patching possible in riscv ftrace. A
> direct benefit of this is that we can get rid of stop_machine() when
> patching function entries. This also makes it possible to run ftrace
> with full kernel preemption. Before this series, the kernel initializes
> patchable function entries to NOP4 + NOP4. To start tracing, it updates
> entries to AUIPC + JALR while holding other cores in stop_machine.
> stop_machine() is required because it is impossible to update 2
> instructions, and be seen atomically. And preemption must have to be
> prevented, as kernel preemption allows process to be scheduled out while
> executing on one of these instruction pairs.
>
> This series addresses the problem by initializing the first NOP4 to
> AUIPC. So, atmoic patching is possible because the kernel only has to
> update one instruction. As long as the instruction is naturally aligned,
> then it is expected to be updated atomically.
>
> However, the address range of the ftrace trampoline is limited to +-2K
> from ftrace_caller after appplying this series. This issue is expected
> to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> data in front of pacthable functions and can  use it to direct execution
> out to any custom trampolines.
>
> The series is composed by three parts. The first part cleans up the
> existing issues when the kernel is compiled with clang.The second part
> modifies the ftrace code patching mechanism (2-4) as mentioned above.
> Then prepare ftrace to be able to run with kernel preemption (5,6)
>
> An ongoing fix:
>
> Since there is no room for marking *kernel_text_address as notrace[1] at
> source code level, there is a significant performance regression when
> using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> 8 graph handler being called in each function-entry. The current
> workaround requires us echo "*kernel_text_address" into
> set_ftrace_notrace before starting the trace. However, we observed that
> the kernel still enables the patch site in some cases even with
> *kernel_text_address properly added in the file While the root cause is
> still under investagtion, we consider that it should not be the reason
> for holding back the code patching, in order to unblock the call_ops
> part.

Maybe Steven knows this from the top of his head!

As Andy points out, "*kernel_text_address" is used in the stack
unwinding on RISC-V. So, if you do a tracing without filtering *and*
TRACE_IRQFLAGS, one will drown in traces.

E.g. the ftrace selftest:
  | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc

will generate a lot of traces.

Now, if we add:
--8<--
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
index ff88f97e41fb..4f30a4d81d99 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
@@ -84,6 +84,7 @@ cd $INSTANCE2
 do_test '*rcu*' 'rcu'
 cd $WD
 cd $INSTANCE3
+echo '*kernel_text_address' > set_ftrace_notrace
 echo function_graph > current_tracer
 
 sleep 1
-->8--

The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
but still enable the *kernel_text_address traces. (Note that there are
no filters in the test, so *all* ftrace recs will be enabled.)

Are we holding the graph tracer wrong?


Happy thanksgiving!
Björn
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Steven Rostedt 11 months, 3 weeks ago
On Wed, 27 Nov 2024 22:25:57 +0100
Björn Töpel <bjorn@kernel.org> wrote:

> Adding Steven.

And this has been in my draft folder for almost a month :-p

I kept coming to this email, but then got distracted by something else.

> 
> Andy Chiu <andybnac@gmail.com> writes:
> 
> > This series makes atmoic code patching possible in riscv ftrace. A
> > direct benefit of this is that we can get rid of stop_machine() when
> > patching function entries. This also makes it possible to run ftrace
> > with full kernel preemption. Before this series, the kernel initializes
> > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > entries to AUIPC + JALR while holding other cores in stop_machine.
> > stop_machine() is required because it is impossible to update 2
> > instructions, and be seen atomically. And preemption must have to be
> > prevented, as kernel preemption allows process to be scheduled out while
> > executing on one of these instruction pairs.
> >
> > This series addresses the problem by initializing the first NOP4 to
> > AUIPC. So, atmoic patching is possible because the kernel only has to
> > update one instruction. As long as the instruction is naturally aligned,
> > then it is expected to be updated atomically.
> >
> > However, the address range of the ftrace trampoline is limited to +-2K
> > from ftrace_caller after appplying this series. This issue is expected
> > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > data in front of pacthable functions and can  use it to direct execution
> > out to any custom trampolines.
> >
> > The series is composed by three parts. The first part cleans up the
> > existing issues when the kernel is compiled with clang.The second part
> > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > Then prepare ftrace to be able to run with kernel preemption (5,6)
> >
> > An ongoing fix:
> >
> > Since there is no room for marking *kernel_text_address as notrace[1] at
> > source code level, there is a significant performance regression when
> > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > 8 graph handler being called in each function-entry. The current
> > workaround requires us echo "*kernel_text_address" into
> > set_ftrace_notrace before starting the trace. However, we observed that
> > the kernel still enables the patch site in some cases even with
> > *kernel_text_address properly added in the file While the root cause is
> > still under investagtion, we consider that it should not be the reason
> > for holding back the code patching, in order to unblock the call_ops
> > part.  
> 
> Maybe Steven knows this from the top of his head!
> 
> As Andy points out, "*kernel_text_address" is used in the stack
> unwinding on RISC-V. So, if you do a tracing without filtering *and*
> TRACE_IRQFLAGS, one will drown in traces.

I tested set_ftrace_notrace on x86 and the function graph tracer does honor
it. I wonder if there's a kernel
		
> 
> E.g. the ftrace selftest:
>   | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> 
> will generate a lot of traces.
> 
> Now, if we add:
> --8<--
> diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> index ff88f97e41fb..4f30a4d81d99 100644
> --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> @@ -84,6 +84,7 @@ cd $INSTANCE2
>  do_test '*rcu*' 'rcu'
>  cd $WD
>  cd $INSTANCE3
> +echo '*kernel_text_address' > set_ftrace_notrace
>  echo function_graph > current_tracer
>  
>  sleep 1
> -->8--  
> 
> The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> but still enable the *kernel_text_address traces. (Note that there are
> no filters in the test, so *all* ftrace recs will be enabled.)
> 
> Are we holding the graph tracer wrong?

What do you get when you do:

  # grep kernel_text_address available_filter_functions

?

> 
> 
> Happy thanksgiving!
> Björn

And Merry Christmas!

-- Steve
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Andy Chiu 11 months, 2 weeks ago
Steven Rostedt <rostedt@goodmis.org> 於 2024年12月24日 週二 上午11:15寫道:
>
> On Wed, 27 Nov 2024 22:25:57 +0100
> Björn Töpel <bjorn@kernel.org> wrote:
>
> > Adding Steven.
>
> And this has been in my draft folder for almost a month :-p
>
> I kept coming to this email, but then got distracted by something else.
>
> >
> > Andy Chiu <andybnac@gmail.com> writes:
> >
> > > This series makes atmoic code patching possible in riscv ftrace. A
> > > direct benefit of this is that we can get rid of stop_machine() when
> > > patching function entries. This also makes it possible to run ftrace
> > > with full kernel preemption. Before this series, the kernel initializes
> > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > stop_machine() is required because it is impossible to update 2
> > > instructions, and be seen atomically. And preemption must have to be
> > > prevented, as kernel preemption allows process to be scheduled out while
> > > executing on one of these instruction pairs.
> > >
> > > This series addresses the problem by initializing the first NOP4 to
> > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > update one instruction. As long as the instruction is naturally aligned,
> > > then it is expected to be updated atomically.
> > >
> > > However, the address range of the ftrace trampoline is limited to +-2K
> > > from ftrace_caller after appplying this series. This issue is expected
> > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > data in front of pacthable functions and can  use it to direct execution
> > > out to any custom trampolines.
> > >
> > > The series is composed by three parts. The first part cleans up the
> > > existing issues when the kernel is compiled with clang.The second part
> > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > >
> > > An ongoing fix:
> > >
> > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > source code level, there is a significant performance regression when
> > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > 8 graph handler being called in each function-entry. The current
> > > workaround requires us echo "*kernel_text_address" into
> > > set_ftrace_notrace before starting the trace. However, we observed that
> > > the kernel still enables the patch site in some cases even with
> > > *kernel_text_address properly added in the file While the root cause is
> > > still under investagtion, we consider that it should not be the reason
> > > for holding back the code patching, in order to unblock the call_ops
> > > part.
> >
> > Maybe Steven knows this from the top of his head!
> >
> > As Andy points out, "*kernel_text_address" is used in the stack
> > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > TRACE_IRQFLAGS, one will drown in traces.
>
> I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> it. I wonder if there's a kernel

After checking the log buffer, I can confirm that riscv does also
honor set_ftrace_notrace. It does not print out "*kernel_text_address"
in the trace buffer. Sorry for not making this clear.

However, the problem is that the patch sites "*kernel_text_address"
were enabled, allowing the code flow into ftrace and parts of fgraph.
In particular, with TRACE_IRQFLAGS enabled, the functions are called
extensively, causing the significant slow down.

IIUC, it is reasonable to enable a patch site if there is at least one
tracing instance that request the function. However, in another
experiment, the patch sites of "*kernel_text_address"  are enabled
even when all instances have them disabled. Here is a way to reproduce
it:

cd /sys/kernel/
mount -t tracefs t tracing/
cd tracing/
mkdir instances/1
mkdir instances/2
cd instances/1/
echo *kernel_text_address > set_ftrace_notrace
echo *sched* > set_ftrace_filter # this is just to make the kernel
more responsive
cat set_ftrace_notrace
echo function_graph > current_tracer
cd ../2/
echo *kernel_text_address > set_ftrace_notrace
cat set_ftrace_notrace
echo function_graph > current_tracer

To figure out this problem, it is probably worth trying to understand
why the patch site is enabled after the above sequence:

It seems like the ftrace_ops being passed into
__ftrace_hash_rec_update() at the start of the second tracer has an
empty notrace hash. This leads to the eventual activation of the patch
sites. However, graph_ops does not have the empty notrace hash when
the user starts the first graph tracer. I am trying to understand this
part of the code, and the relationship between graph_ops and ops in
the fgraph_array.

>
> >
> > E.g. the ftrace selftest:
> >   | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> >
> > will generate a lot of traces.
> >
> > Now, if we add:
> > --8<--
> > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > index ff88f97e41fb..4f30a4d81d99 100644
> > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > @@ -84,6 +84,7 @@ cd $INSTANCE2
> >  do_test '*rcu*' 'rcu'
> >  cd $WD
> >  cd $INSTANCE3
> > +echo '*kernel_text_address' > set_ftrace_notrace
> >  echo function_graph > current_tracer
> >
> >  sleep 1
> > -->8--
> >
> > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > but still enable the *kernel_text_address traces. (Note that there are
> > no filters in the test, so *all* ftrace recs will be enabled.)
> >
> > Are we holding the graph tracer wrong?
>
> What do you get when you do:
>
>   # grep kernel_text_address available_filter_functions
>
> ?

We get:
kernel_text_address
__kernel_text_address

>
> >
> >
> > Happy thanksgiving!
> > Björn
>
> And Merry Christmas!
>
> -- Steve
>

And Happy New Year!


Andy
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Andy Chiu 11 months, 1 week ago
Andy Chiu <andybnac@gmail.com> 於 2024年12月30日 週一 上午3:08寫道:
>
> Steven Rostedt <rostedt@goodmis.org> 於 2024年12月24日 週二 上午11:15寫道:
> >
> > On Wed, 27 Nov 2024 22:25:57 +0100
> > Björn Töpel <bjorn@kernel.org> wrote:
> >
> > > Adding Steven.
> >
> > And this has been in my draft folder for almost a month :-p
> >
> > I kept coming to this email, but then got distracted by something else.
> >
> > >
> > > Andy Chiu <andybnac@gmail.com> writes:
> > >
> > > > This series makes atmoic code patching possible in riscv ftrace. A
> > > > direct benefit of this is that we can get rid of stop_machine() when
> > > > patching function entries. This also makes it possible to run ftrace
> > > > with full kernel preemption. Before this series, the kernel initializes
> > > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > > stop_machine() is required because it is impossible to update 2
> > > > instructions, and be seen atomically. And preemption must have to be
> > > > prevented, as kernel preemption allows process to be scheduled out while
> > > > executing on one of these instruction pairs.
> > > >
> > > > This series addresses the problem by initializing the first NOP4 to
> > > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > > update one instruction. As long as the instruction is naturally aligned,
> > > > then it is expected to be updated atomically.
> > > >
> > > > However, the address range of the ftrace trampoline is limited to +-2K
> > > > from ftrace_caller after appplying this series. This issue is expected
> > > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > > data in front of pacthable functions and can  use it to direct execution
> > > > out to any custom trampolines.
> > > >
> > > > The series is composed by three parts. The first part cleans up the
> > > > existing issues when the kernel is compiled with clang.The second part
> > > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > > >
> > > > An ongoing fix:
> > > >
> > > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > > source code level, there is a significant performance regression when
> > > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > > 8 graph handler being called in each function-entry. The current
> > > > workaround requires us echo "*kernel_text_address" into
> > > > set_ftrace_notrace before starting the trace. However, we observed that
> > > > the kernel still enables the patch site in some cases even with
> > > > *kernel_text_address properly added in the file While the root cause is
> > > > still under investagtion, we consider that it should not be the reason
> > > > for holding back the code patching, in order to unblock the call_ops
> > > > part.
> > >
> > > Maybe Steven knows this from the top of his head!
> > >
> > > As Andy points out, "*kernel_text_address" is used in the stack
> > > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > > TRACE_IRQFLAGS, one will drown in traces.
> >
> > I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> > it. I wonder if there's a kernel
>
> After checking the log buffer, I can confirm that riscv does also
> honor set_ftrace_notrace. It does not print out "*kernel_text_address"
> in the trace buffer. Sorry for not making this clear.
>
> However, the problem is that the patch sites "*kernel_text_address"
> were enabled, allowing the code flow into ftrace and parts of fgraph.
> In particular, with TRACE_IRQFLAGS enabled, the functions are called
> extensively, causing the significant slow down.
>
> IIUC, it is reasonable to enable a patch site if there is at least one
> tracing instance that request the function. However, in another
> experiment, the patch sites of "*kernel_text_address"  are enabled
> even when all instances have them disabled. Here is a way to reproduce
> it:
>
> cd /sys/kernel/
> mount -t tracefs t tracing/
> cd tracing/
> mkdir instances/1
> mkdir instances/2
> cd instances/1/
> echo *kernel_text_address > set_ftrace_notrace
> echo *sched* > set_ftrace_filter # this is just to make the kernel
> more responsive
> cat set_ftrace_notrace
> echo function_graph > current_tracer
> cd ../2/
> echo *kernel_text_address > set_ftrace_notrace
> cat set_ftrace_notrace
> echo function_graph > current_tracer
>
> To figure out this problem, it is probably worth trying to understand
> why the patch site is enabled after the above sequence:
>
> It seems like the ftrace_ops being passed into
> __ftrace_hash_rec_update() at the start of the second tracer has an
> empty notrace hash. This leads to the eventual activation of the patch
> sites. However, graph_ops does not have the empty notrace hash when
> the user starts the first graph tracer. I am trying to understand this
> part of the code, and the relationship between graph_ops and ops in
> the fgraph_array.

Here might be the possible reason: the updated notrace hash for
Manager's ops is not reflecting the intersections of all users'
notrace hashes. According to the comment section at
ftrace_startup_subops():

         *   o If either notrace_hash is empty then the final stays empty
         *      o Otherwise, the final is an intersection between the hashes

It seems like wrong arguments were passed into the intersect_hash().
Instead of user's notrace hash, the filter hash was passed into the
function. As a result, even if the function is marked as notrace in
all ftrace instances, the notrace hash in Manager's ops would not
contain notrace function, leading to enabled call site at the
notrace'd function. If I understand correctly, we should pass notrace
hash instead of filter hash, as shown in the following modification.

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4c28dd177ca6..3a194559d220 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3515,18 +3515,17 @@ int ftrace_startup_subops(struct ftrace_ops
*ops, struct ftrace_ops *subops, int
            ftrace_hash_empty(subops->func_hash->notrace_hash)) {
                notrace_hash = EMPTY_HASH;
        } else {
-               size_bits = max(ops->func_hash->filter_hash->size_bits,
-                               subops->func_hash->filter_hash->size_bits);
+               size_bits = max(ops->func_hash->notrace_hash->size_bits,
+                               subops->func_hash->notrace_hash->size_bits);
                notrace_hash = alloc_ftrace_hash(size_bits);
                if (!notrace_hash) {
-                       free_ftrace_hash(filter_hash);
+                       free_ftrace_hash(notrace_hash);
                        return -ENOMEM;
                }

-               ret = intersect_hash(&notrace_hash, ops->func_hash->filter_hash,
-                                    subops->func_hash->filter_hash);
+               ret = intersect_hash(&notrace_hash,
ops->func_hash->notrace_hash,
+                                    subops->func_hash->notrace_hash);
                if (ret < 0) {
-                       free_ftrace_hash(filter_hash);
                        free_ftrace_hash(notrace_hash);
                        return ret;
                }

I can confirm that callsites at *kernel_text_address are no longer
patched after applying the above patch, if they are set as notrace in
each ftrace instance. I can send out a fix patch if this sounds like a
proper fix.

>
> >
> > >
> > > E.g. the ftrace selftest:
> > >   | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> > >
> > > will generate a lot of traces.
> > >
> > > Now, if we add:
> > > --8<--
> > > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > index ff88f97e41fb..4f30a4d81d99 100644
> > > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > @@ -84,6 +84,7 @@ cd $INSTANCE2
> > >  do_test '*rcu*' 'rcu'
> > >  cd $WD
> > >  cd $INSTANCE3
> > > +echo '*kernel_text_address' > set_ftrace_notrace
> > >  echo function_graph > current_tracer
> > >
> > >  sleep 1
> > > -->8--
> > >
> > > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > > but still enable the *kernel_text_address traces. (Note that there are
> > > no filters in the test, so *all* ftrace recs will be enabled.)
> > >
> > > Are we holding the graph tracer wrong?
> >
> > What do you get when you do:
> >
> >   # grep kernel_text_address available_filter_functions
> >
> > ?
>
> We get:
> kernel_text_address
> __kernel_text_address
>
> >
> > >
> > >
> > > Happy thanksgiving!
> > > Björn
> >
> > And Merry Christmas!
> >
> > -- Steve
> >
>
> And Happy New Year!
>
>
> Andy

Thanks,
Andy
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Björn Töpel 1 year ago
Andy!

"atomic" spelling in the Subject line.

Andy Chiu <andybnac@gmail.com> writes:

> Changes in v3:
> - Add a fix tag for patch 1
> - Add a data fence before sending out remote fence.i (6)
> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/

Hmm, the fixes tag was not included.

Also, there was a lot of comments from v2 that was not addressed:

 * Minor spelling nits
 * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
   same series for DIRECT_CALL, to avoid breakage.

I'll have a look at the barriers (which came up at plumbers)!


Cheers,
Björn
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Andy Chiu 1 year ago
Björn Töpel <bjorn@kernel.org> 於 2024年12月3日 週二 下午8:18寫道:
>
> Andy!
>
> "atomic" spelling in the Subject line.

Sorry, I will fix it

>
> Andy Chiu <andybnac@gmail.com> writes:
>
> > Changes in v3:
> > - Add a fix tag for patch 1
> > - Add a data fence before sending out remote fence.i (6)
> > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
>
> Hmm, the fixes tag was not included.

Do you suggest adding fix tag to the entire series? Or is there any
patches that is missing the fix tag? I am not sure if this is a fix
since we defeatured PREEMPT 2 years ago.

>
> Also, there was a lot of comments from v2 that was not addressed:
>
>  * Minor spelling nits
>  * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
>    same series for DIRECT_CALL, to avoid breakage.

Sorry I didn't get it at the Plumbers. Yes, I can test and merge
Puranjay's series and send a v4.

>
> I'll have a look at the barriers (which came up at plumbers)!
>
>
> Cheers,
> Björn

Thanks,
Andy
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Evgenii Shatokhin 1 year ago
On 03.12.2024 15:18, Björn Töpel wrote:
> Andy!
> 
> "atomic" spelling in the Subject line.
> 
> Andy Chiu <andybnac@gmail.com> writes:
> 
>> Changes in v3:
>> - Add a fix tag for patch 1
>> - Add a data fence before sending out remote fence.i (6)
>> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
> 
> Hmm, the fixes tag was not included.
> 
> Also, there was a lot of comments from v2 that was not addressed:
> 
>   * Minor spelling nits
>   * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
>     same series for DIRECT_CALL, to avoid breakage.

Yes, FTRACE_WITH_DIRECT_CALLS is currently broken. If I try to insmod 
samples/ftrace/ftrace-direct.ko, it reports a failure:


[  179.531472] ------------[ ftrace bug ]------------
[  179.531761] ftrace failed to modify
[  179.531786] [<ffffffff8005f9ac>] wake_up_process+0x0/0x24
[  179.532577]  actual:   97:e2:fa:ff:13:00:00:00
[  179.533211] Setting ftrace call site to call ftrace function
[  179.534409] ftrace record flags: 99980001
[  179.534692]  (1)     tramp: ftrace_caller+0x0/0x34 
(call_direct_funcs+0x0/0x14)
[  179.534692]  expected tramp: ffffffff01b0d000
...

> 
> I'll have a look at the barriers (which came up at plumbers)!

Thank you!
After this series and the CALL_OPS work are done, dynamic Ftrace for 
RISC-V will be even more valuable in production use cases.

> 
> 
> Cheers,
> Björn
> 

Regards,
Evgenii
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Björn Töpel 1 year ago
Evgenii Shatokhin <e.shatokhin@yadro.com> writes:

> On 03.12.2024 15:18, Björn Töpel wrote:
>> Andy!
>> 
>> "atomic" spelling in the Subject line.
>> 
>> Andy Chiu <andybnac@gmail.com> writes:
>> 
>>> Changes in v3:
>>> - Add a fix tag for patch 1
>>> - Add a data fence before sending out remote fence.i (6)
>>> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
>> 
>> Hmm, the fixes tag was not included.
>> 
>> Also, there was a lot of comments from v2 that was not addressed:
>> 
>>   * Minor spelling nits
>>   * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
>>     same series for DIRECT_CALL, to avoid breakage.
>
> Yes, FTRACE_WITH_DIRECT_CALLS is currently broken. If I try to insmod 
> samples/ftrace/ftrace-direct.ko, it reports a failure:
>
>
> [  179.531472] ------------[ ftrace bug ]------------
> [  179.531761] ftrace failed to modify
> [  179.531786] [<ffffffff8005f9ac>] wake_up_process+0x0/0x24
> [  179.532577]  actual:   97:e2:fa:ff:13:00:00:00
> [  179.533211] Setting ftrace call site to call ftrace function
> [  179.534409] ftrace record flags: 99980001
> [  179.534692]  (1)     tramp: ftrace_caller+0x0/0x34 
> (call_direct_funcs+0x0/0x14)
> [  179.534692]  expected tramp: ffffffff01b0d000
> ...

And just a regular Ubuntu 24.10 will fail with all subsystems using BPF
trampoline, e.g.
 ------------[ ftrace bug ]------------
 ftrace failed to modify
 [<ffffffff80250d98>] bpf_lsm_file_open+0x0/0x1c

CALL_OPS with definitely a must for this series.


Björn
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Evgenii Shatokhin 1 year ago
Hi,

On 27.11.2024 20:29, Andy Chiu wrote:
> This series makes atmoic code patching possible in riscv ftrace. A
> direct benefit of this is that we can get rid of stop_machine() when
> patching function entries. This also makes it possible to run ftrace
> with full kernel preemption. Before this series, the kernel initializes
> patchable function entries to NOP4 + NOP4. To start tracing, it updates
> entries to AUIPC + JALR while holding other cores in stop_machine.
> stop_machine() is required because it is impossible to update 2
> instructions, and be seen atomically. And preemption must have to be
> prevented, as kernel preemption allows process to be scheduled out while
> executing on one of these instruction pairs.
> 
> This series addresses the problem by initializing the first NOP4 to
> AUIPC. So, atmoic patching is possible because the kernel only has to
> update one instruction. As long as the instruction is naturally aligned,
> then it is expected to be updated atomically.
> 
> However, the address range of the ftrace trampoline is limited to +-2K
> from ftrace_caller after appplying this series. This issue is expected
> to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> data in front of pacthable functions and can  use it to direct execution
> out to any custom trampolines.
> 
> The series is composed by three parts. The first part cleans up the
> existing issues when the kernel is compiled with clang.The second part
> modifies the ftrace code patching mechanism (2-4) as mentioned above.
> Then prepare ftrace to be able to run with kernel preemption (5,6)
> 
> An ongoing fix:
> 
> Since there is no room for marking *kernel_text_address as notrace[1] at
> source code level, there is a significant performance regression when
> using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> 8 graph handler being called in each function-entry. The current
> workaround requires us echo "*kernel_text_address" into
> set_ftrace_notrace before starting the trace. However, we observed that
> the kernel still enables the patch site in some cases even with
> *kernel_text_address properly added in the file While the root cause is
> still under investagtion, we consider that it should not be the reason
> for holding back the code patching, in order to unblock the call_ops
> part.
> 
> [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/
> 
> Changes in v3:
> - Add a fix tag for patch 1
> - Add a data fence before sending out remote fence.i (6)
> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
> 
> Changes in v2:
> - Drop patch 1 as it is merged through fixes.
> - Drop patch 2, which converts kernel_text_address into notrace. As
>    users can prevent tracing it by configuring the tracefs.
> - Use a more generic way in kconfig to align functions.
> - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com
> 
> 
> Andy Chiu (7):
>    riscv: ftrace: support fastcc in Clang for WITH_ARGS
>    riscv: ftrace: align patchable functions to 4 Byte boundary
>    riscv: ftrace: prepare ftrace for atomic code patching
>    riscv: ftrace: do not use stop_machine to update code
>    riscv: vector: Support calling schedule() for preemptible Vector
>    riscv: add a data fence for CMODX in the kernel mode
>    riscv: ftrace: support PREEMPT
> 
>   arch/riscv/Kconfig                 |   4 +-
>   arch/riscv/include/asm/ftrace.h    |  11 +++
>   arch/riscv/include/asm/processor.h |   5 ++
>   arch/riscv/include/asm/vector.h    |  22 ++++-
>   arch/riscv/kernel/asm-offsets.c    |   7 ++
>   arch/riscv/kernel/ftrace.c         | 133 ++++++++++++-----------------
>   arch/riscv/kernel/mcount-dyn.S     |  25 ++++--
>   arch/riscv/mm/cacheflush.c         |  15 +++-
>   8 files changed, 135 insertions(+), 87 deletions(-)
> ---
> base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
> --
> 2.39.3 (Apple Git-145)

I have tested this series in a QEMU VM (-machine virt) with the 
preemptible kernels, CONFIG_PREEMPT=y.

No issues have been revealed so far.

One of the kernels was built by GCC 13.2.0 (with the patch for minimum 
alignment added on top of it), the other - with LLVM 17.0.6.

In both cases, the basic boottime tests for Ftrace passed.

Switching tracers between nop, function, function_graph and blk in a 
loop in parallel with stress-ng and with network load for several hours 
did not reveal any problems either. Kernel crashes happened in this 
scenario a year ago, but now it runs clean, good!

Function redirection via Ftrace seems to work OK too.

The size of .text section increased slightly after this series, by 0.35% 
- 0.38%, probably because of function alignment:
* 12075 KB => 12121 KB (GCC)
* 12167 KB => 12209 KB (LLVM/clang)

Not sure, how to test the vector-related part though, "[PATCH v3 5/7] 
riscv: vector: Support calling schedule() for preemptible Vector"

For all other patches in the series:
Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>

Regards,
Evgenii
Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
Posted by Andy Chiu 1 year ago
Hi Evgenii,

Evgenii Shatokhin <e.shatokhin@yadro.com> 於 2024年12月2日 週一 下午3:58寫道:
>
> Hi,
>
> On 27.11.2024 20:29, Andy Chiu wrote:
> > This series makes atmoic code patching possible in riscv ftrace. A
> > direct benefit of this is that we can get rid of stop_machine() when
> > patching function entries. This also makes it possible to run ftrace
> > with full kernel preemption. Before this series, the kernel initializes
> > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > entries to AUIPC + JALR while holding other cores in stop_machine.
> > stop_machine() is required because it is impossible to update 2
> > instructions, and be seen atomically. And preemption must have to be
> > prevented, as kernel preemption allows process to be scheduled out while
> > executing on one of these instruction pairs.
> >
> > This series addresses the problem by initializing the first NOP4 to
> > AUIPC. So, atmoic patching is possible because the kernel only has to
> > update one instruction. As long as the instruction is naturally aligned,
> > then it is expected to be updated atomically.
> >
> > However, the address range of the ftrace trampoline is limited to +-2K
> > from ftrace_caller after appplying this series. This issue is expected
> > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > data in front of pacthable functions and can  use it to direct execution
> > out to any custom trampolines.
> >
> > The series is composed by three parts. The first part cleans up the
> > existing issues when the kernel is compiled with clang.The second part
> > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > Then prepare ftrace to be able to run with kernel preemption (5,6)
> >
> > An ongoing fix:
> >
> > Since there is no room for marking *kernel_text_address as notrace[1] at
> > source code level, there is a significant performance regression when
> > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > 8 graph handler being called in each function-entry. The current
> > workaround requires us echo "*kernel_text_address" into
> > set_ftrace_notrace before starting the trace. However, we observed that
> > the kernel still enables the patch site in some cases even with
> > *kernel_text_address properly added in the file While the root cause is
> > still under investagtion, we consider that it should not be the reason
> > for holding back the code patching, in order to unblock the call_ops
> > part.
> >
> > [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/
> >
> > Changes in v3:
> > - Add a fix tag for patch 1
> > - Add a data fence before sending out remote fence.i (6)
> > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
> >
> > Changes in v2:
> > - Drop patch 1 as it is merged through fixes.
> > - Drop patch 2, which converts kernel_text_address into notrace. As
> >    users can prevent tracing it by configuring the tracefs.
> > - Use a more generic way in kconfig to align functions.
> > - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com
> >
> >
> > Andy Chiu (7):
> >    riscv: ftrace: support fastcc in Clang for WITH_ARGS
> >    riscv: ftrace: align patchable functions to 4 Byte boundary
> >    riscv: ftrace: prepare ftrace for atomic code patching
> >    riscv: ftrace: do not use stop_machine to update code
> >    riscv: vector: Support calling schedule() for preemptible Vector
> >    riscv: add a data fence for CMODX in the kernel mode
> >    riscv: ftrace: support PREEMPT
> >
> >   arch/riscv/Kconfig                 |   4 +-
> >   arch/riscv/include/asm/ftrace.h    |  11 +++
> >   arch/riscv/include/asm/processor.h |   5 ++
> >   arch/riscv/include/asm/vector.h    |  22 ++++-
> >   arch/riscv/kernel/asm-offsets.c    |   7 ++
> >   arch/riscv/kernel/ftrace.c         | 133 ++++++++++++-----------------
> >   arch/riscv/kernel/mcount-dyn.S     |  25 ++++--
> >   arch/riscv/mm/cacheflush.c         |  15 +++-
> >   8 files changed, 135 insertions(+), 87 deletions(-)
> > ---
> > base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
> > --
> > 2.39.3 (Apple Git-145)
>
> I have tested this series in a QEMU VM (-machine virt) with the
> preemptible kernels, CONFIG_PREEMPT=y.
>
> No issues have been revealed so far.
>
> One of the kernels was built by GCC 13.2.0 (with the patch for minimum
> alignment added on top of it), the other - with LLVM 17.0.6.
>
> In both cases, the basic boottime tests for Ftrace passed.
>
> Switching tracers between nop, function, function_graph and blk in a
> loop in parallel with stress-ng and with network load for several hours
> did not reveal any problems either. Kernel crashes happened in this
> scenario a year ago, but now it runs clean, good!
>
> Function redirection via Ftrace seems to work OK too.
>
> The size of .text section increased slightly after this series, by 0.35%
> - 0.38%, probably because of function alignment:
> * 12075 KB => 12121 KB (GCC)
> * 12167 KB => 12209 KB (LLVM/clang)
>
> Not sure, how to test the vector-related part though, "[PATCH v3 5/7]
> riscv: vector: Support calling schedule() for preemptible Vector"

This should be tested as long as both the kernel is compiled with
RISCV_ISA_V and PREEMPT, and the hardware supports v.

And thanks for the extensive testing

>
> For all other patches in the series:
> Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
>
> Regards,
> Evgenii