[PATCH] objtool,x86: Teach decode about LOOP* instructions

Peter Zijlstra posted 1 patch 3 years, 7 months ago
tools/objtool/arch/x86/decode.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH] objtool,x86: Teach decode about LOOP* instructions
Posted by Peter Zijlstra 3 years, 7 months ago
On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> 
> > +/* Return the jump target address or 0 */
> > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > +{
> > +	switch (insn->opcode.bytes[0]) {
> > +	case 0xe0:	/* loopne */
> > +	case 0xe1:	/* loope */
> > +	case 0xe2:	/* loop */
> 
> Oh cute, objtool doesn't know about those, let me go add them.

---
Subject: objtool,x86: Teach decode about LOOP* instructions

With kprobes also needing to follow control flow; it was found that
objtool is missing the branches from the LOOP* instructions.

Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/arch/x86/decode.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c260006106be..1c253b4b7ce0 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -635,6 +635,12 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
 		*type = INSN_CONTEXT_SWITCH;
 		break;
 
+	case 0xe0: /* loopne */
+	case 0xe1: /* loope */
+	case 0xe2: /* loop */
+		*type = INSN_JUMP_CONDITIONAL;
+		break;
+
 	case 0xe8:
 		*type = INSN_CALL;
 		/*
RE: [PATCH] objtool,x86: Teach decode about LOOP* instructions
Posted by David Laight 3 years, 7 months ago
From: Peter Zijlstra
> Sent: 07 September 2022 10:01
> 
> On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> >
> > > +/* Return the jump target address or 0 */
> > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > +{
> > > +	switch (insn->opcode.bytes[0]) {
> > > +	case 0xe0:	/* loopne */
> > > +	case 0xe1:	/* loope */
> > > +	case 0xe2:	/* loop */
> >
> > Oh cute, objtool doesn't know about those, let me go add them.

Do they ever appear in the kernel?
They are so slow on Intel cpu that finding one ought to
deemed a bug!

Have you got jcxz (0xe3) in there?
They are fast on both Intel and AMD cpus - so are usable.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH] objtool,x86: Teach decode about LOOP* instructions
Posted by Peter Zijlstra 3 years, 7 months ago
On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 07 September 2022 10:01
> > 
> > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> > >
> > > > +/* Return the jump target address or 0 */
> > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > > +{
> > > > +	switch (insn->opcode.bytes[0]) {
> > > > +	case 0xe0:	/* loopne */
> > > > +	case 0xe1:	/* loope */
> > > > +	case 0xe2:	/* loop */
> > >
> > > Oh cute, objtool doesn't know about those, let me go add them.
> 
> Do they ever appear in the kernel?

No; that is, not on any of the random vmlinux.o images I checked this
morning.

Still, best to properly decode them anyway.
RE: [PATCH] objtool,x86: Teach decode about LOOP* instructions
Posted by David Laight 3 years, 7 months ago
From: Peter Zijlstra
> Sent: 07 September 2022 10:40
> 
> On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 07 September 2022 10:01
> > >
> > > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> > > >
> > > > > +/* Return the jump target address or 0 */
> > > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > > > +{
> > > > > +	switch (insn->opcode.bytes[0]) {
> > > > > +	case 0xe0:	/* loopne */
> > > > > +	case 0xe1:	/* loope */
> > > > > +	case 0xe2:	/* loop */
> > > >
> > > > Oh cute, objtool doesn't know about those, let me go add them.
> >
> > Do they ever appear in the kernel?
> 
> No; that is, not on any of the random vmlinux.o images I checked this
> morning.
> 
> Still, best to properly decode them anyway.

It is annoying that cpu with adox/adcx have slow loop.
You really want to be able to do:
	1:	adox ...
		adcx ...
		loop	1b
That would never run with one iteration/clock.
But unrolling once would probably be enough.

What you can do (and gives the fastest IPcsum loop) is:
	1:	jcxz	2f
		....
		lea	%rcx,...
		jmp	1b
	2:
The extra instructions mean that needs unrolling 4 times.
I've got over 12 bytes/clock that way.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
[tip: objtool/core] objtool,x86: Teach decode about LOOP* instructions
Posted by tip-bot2 for Peter Zijlstra 3 years, 6 months ago
The following commit has been merged into the objtool/core branch of tip:

Commit-ID:     7a7621dfa417aa3715d2a3bd1bdd6cf5018274d0
Gitweb:        https://git.kernel.org/tip/7a7621dfa417aa3715d2a3bd1bdd6cf5018274d0
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 07 Sep 2022 11:01:20 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 15 Sep 2022 16:13:55 +02:00

objtool,x86: Teach decode about LOOP* instructions

When 'discussing' control flow Masami mentioned the LOOP* instructions
and I realized objtool doesn't decode them properly.

As it turns out, these instructions are somewhat inefficient and as
such unlikely to be emitted by the compiler (a few vmlinux.o checks
can't find a single one) so this isn't critical, but still, best to
decode them properly.

Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net
---
 tools/objtool/arch/x86/decode.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c260006..1c253b4 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -635,6 +635,12 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
 		*type = INSN_CONTEXT_SWITCH;
 		break;
 
+	case 0xe0: /* loopne */
+	case 0xe1: /* loope */
+	case 0xe2: /* loop */
+		*type = INSN_JUMP_CONDITIONAL;
+		break;
+
 	case 0xe8:
 		*type = INSN_CALL;
 		/*