tools/objtool/arch/x86/decode.c | 6 ++++++ 1 file changed, 6 insertions(+)
On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
>
> > +/* Return the jump target address or 0 */
> > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > +{
> > + switch (insn->opcode.bytes[0]) {
> > + case 0xe0: /* loopne */
> > + case 0xe1: /* loope */
> > + case 0xe2: /* loop */
>
> Oh cute, objtool doesn't know about those, let me go add them.
---
Subject: objtool,x86: Teach decode about LOOP* instructions
With kprobes also needing to follow control flow; it was found that
objtool is missing the branches from the LOOP* instructions.
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
tools/objtool/arch/x86/decode.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c260006106be..1c253b4b7ce0 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -635,6 +635,12 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
*type = INSN_CONTEXT_SWITCH;
break;
+ case 0xe0: /* loopne */
+ case 0xe1: /* loope */
+ case 0xe2: /* loop */
+ *type = INSN_JUMP_CONDITIONAL;
+ break;
+
case 0xe8:
*type = INSN_CALL;
/*
From: Peter Zijlstra
> Sent: 07 September 2022 10:01
>
> On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> >
> > > +/* Return the jump target address or 0 */
> > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > +{
> > > + switch (insn->opcode.bytes[0]) {
> > > + case 0xe0: /* loopne */
> > > + case 0xe1: /* loope */
> > > + case 0xe2: /* loop */
> >
> > Oh cute, objtool doesn't know about those, let me go add them.
Do they ever appear in the kernel?
They are so slow on Intel cpu that finding one ought to
deemed a bug!
Have you got jcxz (0xe3) in there?
They are fast on both Intel and AMD cpus - so are usable.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 07 September 2022 10:01
> >
> > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> > >
> > > > +/* Return the jump target address or 0 */
> > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > > +{
> > > > + switch (insn->opcode.bytes[0]) {
> > > > + case 0xe0: /* loopne */
> > > > + case 0xe1: /* loope */
> > > > + case 0xe2: /* loop */
> > >
> > > Oh cute, objtool doesn't know about those, let me go add them.
>
> Do they ever appear in the kernel?
No; that is, not on any of the random vmlinux.o images I checked this
morning.
Still, best to properly decode them anyway.
From: Peter Zijlstra
> Sent: 07 September 2022 10:40
>
> On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 07 September 2022 10:01
> > >
> > > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> > > >
> > > > > +/* Return the jump target address or 0 */
> > > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > > > +{
> > > > > + switch (insn->opcode.bytes[0]) {
> > > > > + case 0xe0: /* loopne */
> > > > > + case 0xe1: /* loope */
> > > > > + case 0xe2: /* loop */
> > > >
> > > > Oh cute, objtool doesn't know about those, let me go add them.
> >
> > Do they ever appear in the kernel?
>
> No; that is, not on any of the random vmlinux.o images I checked this
> morning.
>
> Still, best to properly decode them anyway.
It is annoying that cpu with adox/adcx have slow loop.
You really want to be able to do:
1: adox ...
adcx ...
loop 1b
That would never run with one iteration/clock.
But unrolling once would probably be enough.
What you can do (and gives the fastest IPcsum loop) is:
1: jcxz 2f
....
lea %rcx,...
jmp 1b
2:
The extra instructions mean that needs unrolling 4 times.
I've got over 12 bytes/clock that way.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
The following commit has been merged into the objtool/core branch of tip:
Commit-ID: 7a7621dfa417aa3715d2a3bd1bdd6cf5018274d0
Gitweb: https://git.kernel.org/tip/7a7621dfa417aa3715d2a3bd1bdd6cf5018274d0
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 07 Sep 2022 11:01:20 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 15 Sep 2022 16:13:55 +02:00
objtool,x86: Teach decode about LOOP* instructions
When 'discussing' control flow Masami mentioned the LOOP* instructions
and I realized objtool doesn't decode them properly.
As it turns out, these instructions are somewhat inefficient and as
such unlikely to be emitted by the compiler (a few vmlinux.o checks
can't find a single one) so this isn't critical, but still, best to
decode them properly.
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net
---
tools/objtool/arch/x86/decode.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c260006..1c253b4 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -635,6 +635,12 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
*type = INSN_CONTEXT_SWITCH;
break;
+ case 0xe0: /* loopne */
+ case 0xe1: /* loope */
+ case 0xe2: /* loop */
+ *type = INSN_JUMP_CONDITIONAL;
+ break;
+
case 0xe8:
*type = INSN_CALL;
/*
© 2016 - 2026 Red Hat, Inc.