When compiling with Clang on LoongArch, there exists the following objtool
warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
by Clang, given dc_fixpt_from_fraction() is not a simple function, just
mark it noinline to avoid the above issue.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
index 88d3f9d7dd55..b40c6a21460d 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
@@ -68,7 +68,7 @@ static inline unsigned long long complete_integer_division_u64(
#define GET_FRACTIONAL_PART(x) \
(FRACTIONAL_PART_MASK & (x))
-struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
+noinline struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
{
struct fixed31_32 res;
--
2.42.0
Hi, Tiezhu,
On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> When compiling with Clang on LoongArch, there exists the following objtool
> warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
>
> dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
>
> This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
> by Clang, given dc_fixpt_from_fraction() is not a simple function, just
> mark it noinline to avoid the above issue.
I don't know whether drm maintainers can accept this, because it looks
like a workaround. Yes, uninline this function "solve" a problem and
seems reasonable in this case because the function is "not simple",
but from another point of view, you may hide a type of bug.
Huacai
>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
> drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> index 88d3f9d7dd55..b40c6a21460d 100644
> --- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> +++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> @@ -68,7 +68,7 @@ static inline unsigned long long complete_integer_division_u64(
> #define GET_FRACTIONAL_PART(x) \
> (FRACTIONAL_PART_MASK & (x))
>
> -struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
> +noinline struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
> {
> struct fixed31_32 res;
>
> --
> 2.42.0
>
On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote: > Hi, Tiezhu, > > On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote: > > > > When compiling with Clang on LoongArch, there exists the following objtool > > warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o: > > > > dc_fixpt_recip() falls through to next function dc_fixpt_sinc() > > > > This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip() > > by Clang, given dc_fixpt_from_fraction() is not a simple function, just > > mark it noinline to avoid the above issue. > I don't know whether drm maintainers can accept this, because it looks > like a workaround. Yes, uninline this function "solve" a problem and > seems reasonable in this case because the function is "not simple", > but from another point of view, you may hide a type of bug. Agreed, it sounds like there's definitely a bug which this patch is papering over. -- Josh
On Wed, Dec 18, 2024 at 2:18 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote: > > Hi, Tiezhu, > > > > On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote: > > > > > > When compiling with Clang on LoongArch, there exists the following objtool > > > warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o: > > > > > > dc_fixpt_recip() falls through to next function dc_fixpt_sinc() > > > > > > This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip() > > > by Clang, given dc_fixpt_from_fraction() is not a simple function, just > > > mark it noinline to avoid the above issue. > > I don't know whether drm maintainers can accept this, because it looks > > like a workaround. Yes, uninline this function "solve" a problem and > > seems reasonable in this case because the function is "not simple", > > but from another point of view, you may hide a type of bug. > > Agreed, it sounds like there's definitely a bug which this patch is > papering over. Yes, agreed. Alex > > -- > Josh
On 12/19/2024 03:22 AM, Alex Deucher wrote:
> On Wed, Dec 18, 2024 at 2:18 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>>
>> On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote:
>>> Hi, Tiezhu,
>>>
>>> On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>>>>
>>>> When compiling with Clang on LoongArch, there exists the following objtool
>>>> warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
>>>>
>>>> dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
>>>>
>>>> This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
>>>> by Clang, given dc_fixpt_from_fraction() is not a simple function, just
>>>> mark it noinline to avoid the above issue.
>>> I don't know whether drm maintainers can accept this, because it looks
>>> like a workaround. Yes, uninline this function "solve" a problem and
>>> seems reasonable in this case because the function is "not simple",
>>> but from another point of view, you may hide a type of bug.
>>
>> Agreed, it sounds like there's definitely a bug which this patch is
>> papering over.
>
> Yes, agreed.
Additional Info:
In order to avoid the effect of this series, I tested with kernel
6.13-rc3 without this series again.
--------------------------------------------------------------------------
Here are the test result.
1. For LoongArch
"dc_fixpt_recip() falls through to next function dc_fixpt_sinc()"
objtool warning only appears compiled with the latest mainline LLVM,
there is no this issue compiled with the latest release version such
as LLVM 19.1.6.
(1) objdump info with LLVM release version 19.1.6:
$ clang --version | head -1
clang version 19.1.6
There is an jump instruction "b" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.
0000000000000200 <dc_fixpt_recip>:
200: 40009480 beqz $a0, 148 # 294
<dc_fixpt_recip+0x94>
204: 0049fc85 srai.d $a1, $a0, 0x3f
208: 00159486 xor $a2, $a0, $a1
20c: 001194c5 sub.d $a1, $a2, $a1
210: 03800007 ori $a3, $zero, 0x0
214: 16000027 lu32i.d $a3, 1
218: 002314e6 div.du $a2, $a3, $a1
21c: 001d94c8 mul.d $a4, $a2, $a1
220: 0011a0e9 sub.d $a5, $a3, $a4
224: 02bf8008 addi.w $a4, $zero, -32
228: 03400000 andi $zero, $zero, 0x0
22c: 03400000 andi $zero, $zero, 0x0
230: 00410529 slli.d $a5, $a5, 0x1
234: 004104c6 slli.d $a2, $a2, 0x1
238: 0012952a sltu $a6, $a5, $a1
23c: 03c0054a xori $a6, $a6, 0x1
240: 001328ab maskeqz $a7, $a1, $a6
244: 0011ad29 sub.d $a5, $a5, $a7
248: 00df0108 bstrpick.d $a4, $a4, 0x1f, 0x0
24c: 02c00508 addi.d $a4, $a4, 1
250: 00149d0b and $a7, $a4, $a3
254: 001528c6 or $a2, $a2, $a6
258: 43ffd97f beqz $a7, -40 # 230
<dc_fixpt_recip+0x30>
25c: 00410527 slli.d $a3, $a5, 0x1
260: 001294e5 sltu $a1, $a3, $a1
264: 03c004a5 xori $a1, $a1, 0x1
268: 02bffc07 addi.w $a3, $zero, -1
26c: 031ffce7 lu52i.d $a3, $a3, 2047
270: 00159ca7 xor $a3, $a1, $a3
274: 680030e6 bltu $a3, $a2, 48 # 2a4
<dc_fixpt_recip+0xa4>
278: 001094c5 add.d $a1, $a2, $a1
27c: 00119406 sub.d $a2, $zero, $a1
280: 02000084 slti $a0, $a0, 0
284: 001390a5 masknez $a1, $a1, $a0
288: 001310c4 maskeqz $a0, $a2, $a0
28c: 00151484 or $a0, $a0, $a1
290: 4c000020 jirl $zero, $ra, 0
294: 00150005 or $a1, $zero, $zero
298: 002a0001 break 0x1
29c: 002a0001 break 0x1
2a0: 53ff73ff b -144 # 210
<dc_fixpt_recip+0x10>
2a4: 002a0001 break 0x1
2a8: 53ffd3ff b -48 # 278
<dc_fixpt_recip+0x78>
2ac: 03400000 andi $zero, $zero, 0x0
2b0: 03400000 andi $zero, $zero, 0x0
2b4: 03400000 andi $zero, $zero, 0x0
2b8: 03400000 andi $zero, $zero, 0x0
2bc: 03400000 andi $zero, $zero, 0x0
00000000000002c0 <dc_fixpt_sinc>:
2c0: 0049fc85 srai.d $a1, $a0, 0x3f
2c4: 00159486 xor $a2, $a0, $a1
2c8: 1490fda7 lu12i.w $a3, 296941
2cc: 039444e7 ori $a3, $a3, 0x511
2d0: 001194c6 sub.d $a2, $a2, $a1
2d4: 001500e5 or $a1, $a3, $zero
2d8: 160000c5 lu32i.d $a1, 6
2dc: 001500c9 or $a5, $a2, $zero
2e0: 00150088 or $a4, $a0, $zero
2e4: 6000a8c5 blt $a2, $a1, 168 # 38c
<dc_fixpt_sinc+0xcc>
(2) objdump info with LLVM mainline version 20.0.0git:
$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git
8daf4f16fa08b5d876e98108721dd1743a360326)
There is "break" instruction at the end of dc_fixpt_recip(), its offset
is "2a0", this "break" instruction is not dead end due to the offset
"2a4" is in the relocation section '.rela.discard.reachable', that is
to say, dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.
0000000000000200 <dc_fixpt_recip>:
200: 40009c80 beqz $a0, 156 # 29c
<dc_fixpt_recip+0x9c>
204: 0049fc85 srai.d $a1, $a0, 0x3f
208: 00159486 xor $a2, $a0, $a1
20c: 001194c5 sub.d $a1, $a2, $a1
210: 03800007 ori $a3, $zero, 0x0
214: 16000027 lu32i.d $a3, 1
218: 002314e6 div.du $a2, $a3, $a1
21c: 001d94c8 mul.d $a4, $a2, $a1
220: 0011a0e9 sub.d $a5, $a3, $a4
224: 02bf8008 addi.w $a4, $zero, -32
228: 03400000 andi $zero, $zero, 0x0
22c: 03400000 andi $zero, $zero, 0x0
230: 00410529 slli.d $a5, $a5, 0x1
234: 004104c6 slli.d $a2, $a2, 0x1
238: 0012952a sltu $a6, $a5, $a1
23c: 03c0054a xori $a6, $a6, 0x1
240: 001328ab maskeqz $a7, $a1, $a6
244: 0011ad29 sub.d $a5, $a5, $a7
248: 00df0108 bstrpick.d $a4, $a4, 0x1f, 0x0
24c: 02c00508 addi.d $a4, $a4, 1
250: 00149d0b and $a7, $a4, $a3
254: 001528c6 or $a2, $a2, $a6
258: 43ffd97f beqz $a7, -40 # 230
<dc_fixpt_recip+0x30>
25c: 00410527 slli.d $a3, $a5, 0x1
260: 001294e5 sltu $a1, $a3, $a1
264: 03c004a5 xori $a1, $a1, 0x1
268: 02bffc07 addi.w $a3, $zero, -1
26c: 031ffce7 lu52i.d $a3, $a3, 2047
270: 00159ca7 xor $a3, $a1, $a3
274: 680020e6 bltu $a3, $a2, 32 # 294
<dc_fixpt_recip+0x94>
278: 001094c5 add.d $a1, $a2, $a1
27c: 00119406 sub.d $a2, $zero, $a1
280: 02000084 slti $a0, $a0, 0
284: 001390a5 masknez $a1, $a1, $a0
288: 001310c4 maskeqz $a0, $a2, $a0
28c: 00151484 or $a0, $a0, $a1
290: 4c000020 jirl $zero, $ra, 0
294: 002a0001 break 0x1
298: 53ffe3ff b -32 # 278
<dc_fixpt_recip+0x78>
29c: 002a0001 break 0x1
2a0: 002a0001 break 0x1
2a4: 03400000 andi $zero, $zero, 0x0
2a8: 03400000 andi $zero, $zero, 0x0
2ac: 03400000 andi $zero, $zero, 0x0
2b0: 03400000 andi $zero, $zero, 0x0
2b4: 03400000 andi $zero, $zero, 0x0
2b8: 03400000 andi $zero, $zero, 0x0
2bc: 03400000 andi $zero, $zero, 0x0
00000000000002c0 <dc_fixpt_sinc>:
2c0: 0049fc85 srai.d $a1, $a0, 0x3f
2c4: 00159486 xor $a2, $a0, $a1
2c8: 001194c6 sub.d $a2, $a2, $a1
2cc: 1490fda5 lu12i.w $a1, 296941
2d0: 039444a5 ori $a1, $a1, 0x511
2d4: 160000c5 lu32i.d $a1, 6
2d8: 001500c7 or $a3, $a2, $zero
2dc: 00150088 or $a4, $a0, $zero
2e0: 600090c5 blt $a2, $a1, 144 # 370
<dc_fixpt_sinc+0xb0>
2. For x86
I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result
is same with LoongArch.
(1) objdump info with LLVM release version 19.1.6:
$ clang --version | head -1
clang version 19.1.6
There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.
0000000000000290 <dc_fixpt_recip>:
290: f3 0f 1e fa endbr64
294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9>
299: 48 85 ff test %rdi,%rdi
29c: 0f 84 a8 00 00 00 je 34a <dc_fixpt_recip+0xba>
2a2: 48 89 f9 mov %rdi,%rcx
2a5: 48 f7 d9 neg %rcx
2a8: 48 0f 48 cf cmovs %rdi,%rcx
2ac: 53 push %rbx
2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax
2b4: 00 00 00
2b7: 31 d2 xor %edx,%edx
2b9: 48 f7 f1 div %rcx
2bc: be e0 ff ff ff mov $0xffffffe0,%esi
2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e>
2c3: 45 88 c8 mov %r9b,%r8b
2c6: 4d 01 c0 add %r8,%r8
2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8
2cd: 48 29 da sub %rbx,%rdx
2d0: 45 88 da mov %r11b,%r10b
2d3: 4c 89 d0 mov %r10,%rax
2d6: 4c 09 c0 or %r8,%rax
2d9: 83 c6 02 add $0x2,%esi
2dc: 74 31 je 30f <dc_fixpt_recip+0x7f>
2de: 48 01 d2 add %rdx,%rdx
2e1: 45 31 c0 xor %r8d,%r8d
2e4: 41 ba 00 00 00 00 mov $0x0,%r10d
2ea: 48 39 ca cmp %rcx,%rdx
2ed: 41 0f 93 c1 setae %r9b
2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66>
2f3: 49 89 ca mov %rcx,%r10
2f6: 4c 29 d2 sub %r10,%rdx
2f9: 48 01 d2 add %rdx,%rdx
2fc: 45 31 d2 xor %r10d,%r10d
2ff: 48 89 cb mov %rcx,%rbx
302: 48 39 ca cmp %rcx,%rdx
305: 41 0f 93 c3 setae %r11b
309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33>
30b: 31 db xor %ebx,%ebx
30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33>
30f: 48 01 d2 add %rdx,%rdx
312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi
319: ff ff 7f
31c: 4c 8d 46 01 lea 0x1(%rsi),%r8
320: 48 39 ca cmp %rcx,%rdx
323: 4c 0f 43 c6 cmovae %rsi,%r8
327: 4c 39 c0 cmp %r8,%rax
32a: 77 29 ja 355 <dc_fixpt_recip+0xc5>
32c: 48 39 ca cmp %rcx,%rdx
32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax
333: 48 89 c1 mov %rax,%rcx
336: 48 f7 d9 neg %rcx
339: 48 85 ff test %rdi,%rdi
33c: 48 0f 49 c8 cmovns %rax,%rcx
340: 48 89 c8 mov %rcx,%rax
343: 5b pop %rbx
344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba>
34a: 0f 0b ud2
34c: 0f 0b ud2
34e: 31 c9 xor %ecx,%ecx
350: e9 57 ff ff ff jmp 2ac <dc_fixpt_recip+0x1c>
355: 0f 0b ud2
357: eb d3 jmp 32c <dc_fixpt_recip+0x9c>
359: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000000360 <.Ltmp40>:
360: 90 nop
361: 90 nop
362: 90 nop
363: 90 nop
364: 90 nop
365: 90 nop
366: 90 nop
367: 90 nop
368: 90 nop
369: 90 nop
36a: 90 nop
36b: 90 nop
36c: 90 nop
36d: 90 nop
36e: 90 nop
36f: 90 nop
0000000000000370 <dc_fixpt_sinc>:
370: f3 0f 1e fa endbr64
374: e8 00 00 00 00 call 379 <dc_fixpt_sinc+0x9>
(2) objdump info with LLVM mainline version 20.0.0git:
$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git
8daf4f16fa08b5d876e98108721dd1743a360326)
There is "ud2" instruction at the end of dc_fixpt_recip(), its offset
is "350", this "ud2" instruction is not dead end due to the offset "352"
is in the relocation section '.rela.discard.reachable', that is to say,
dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.
0000000000000290 <dc_fixpt_recip>:
290: f3 0f 1e fa endbr64
294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9>
299: 48 85 ff test %rdi,%rdi
29c: 0f 84 ac 00 00 00 je 34e <dc_fixpt_recip+0xbe>
2a2: 53 push %rbx
2a3: 48 89 f9 mov %rdi,%rcx
2a6: 48 f7 d9 neg %rcx
2a9: 48 0f 48 cf cmovs %rdi,%rcx
2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax
2b4: 00 00 00
2b7: 31 d2 xor %edx,%edx
2b9: 48 f7 f1 div %rcx
2bc: be e0 ff ff ff mov $0xffffffe0,%esi
2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e>
2c3: 45 88 c8 mov %r9b,%r8b
2c6: 4d 01 c0 add %r8,%r8
2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8
2cd: 48 29 da sub %rbx,%rdx
2d0: 45 88 da mov %r11b,%r10b
2d3: 4c 89 d0 mov %r10,%rax
2d6: 4c 09 c0 or %r8,%rax
2d9: 83 c6 02 add $0x2,%esi
2dc: 74 31 je 30f <dc_fixpt_recip+0x7f>
2de: 48 01 d2 add %rdx,%rdx
2e1: 45 31 c0 xor %r8d,%r8d
2e4: 41 ba 00 00 00 00 mov $0x0,%r10d
2ea: 48 39 ca cmp %rcx,%rdx
2ed: 41 0f 93 c1 setae %r9b
2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66>
2f3: 49 89 ca mov %rcx,%r10
2f6: 4c 29 d2 sub %r10,%rdx
2f9: 48 01 d2 add %rdx,%rdx
2fc: 45 31 d2 xor %r10d,%r10d
2ff: 48 89 cb mov %rcx,%rbx
302: 48 39 ca cmp %rcx,%rdx
305: 41 0f 93 c3 setae %r11b
309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33>
30b: 31 db xor %ebx,%ebx
30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33>
30f: 48 01 d2 add %rdx,%rdx
312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi
319: ff ff 7f
31c: 4c 8d 46 01 lea 0x1(%rsi),%r8
320: 48 39 ca cmp %rcx,%rdx
323: 4c 0f 43 c6 cmovae %rsi,%r8
327: 4c 39 c0 cmp %r8,%rax
32a: 77 1e ja 34a <dc_fixpt_recip+0xba>
32c: 48 39 ca cmp %rcx,%rdx
32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax
333: 48 89 c1 mov %rax,%rcx
336: 48 f7 d9 neg %rcx
339: 48 85 ff test %rdi,%rdi
33c: 48 0f 49 c8 cmovns %rax,%rcx
340: 48 89 c8 mov %rcx,%rax
343: 5b pop %rbx
344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba>
34a: 0f 0b ud2
34c: eb de jmp 32c <dc_fixpt_recip+0x9c>
34e: 0f 0b ud2
350: 0f 0b ud2
352: 66 66 66 66 66 2e 0f data16 data16 data16 data16 cs
nopw 0x0(%rax,%rax,1)
359: 1f 84 00 00 00 00 00
0000000000000360 <.Ltmp40>:
360: 90 nop
361: 90 nop
362: 90 nop
363: 90 nop
364: 90 nop
365: 90 nop
366: 90 nop
367: 90 nop
368: 90 nop
369: 90 nop
36a: 90 nop
36b: 90 nop
36c: 90 nop
36d: 90 nop
36e: 90 nop
36f: 90 nop
0000000000000370 <dc_fixpt_sinc>:
370: f3 0f 1e fa endbr64
374: e8 00 00 00 00 call 379 <dc_fixpt_sinc+0x9>
--------------------------------------------------------------------------
In my opinion, if there is a bug, then it is a generic rather than
arch-specific bug.
The root cause is related with LLVM or Linux kernel? I am not sure
because objtool only checks the object info and reports the warning
"falls through" if the check conditions are true.
tools/objtool/check.c
static int validate_branch(struct objtool_file *file, struct symbol *func,
struct instruction *insn, struct insn_state
state)
{
struct alternative *alt;
struct instruction *next_insn, *prev_insn = NULL;
struct section *sec;
u8 visited;
int ret;
sec = insn->sec;
while (1) {
next_insn = next_insn_to_validate(file, insn);
if (func && insn_func(insn) && func !=
insn_func(insn)->pfunc) {
/* Ignore KCFI type preambles, which always
fall through */
if (!strncmp(func->name, "__cfi_", 6) ||
!strncmp(func->name, "__pfx_", 6))
return 0;
WARN("%s() falls through to next function %s()",
func->name, insn_func(insn)->name);
return 1;
}
...
}
How to silence the warning when compiling with the latest mainline
LLVM with a proper way? Modify LLVM code or tools/objtool/check.c
or drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c?
Anyway, I think this patch can be independent of this series and can
be sent separately after the "real bug" is fixed, please ignore it now.
Thanks,
Tiezhu
On Fri, Dec 20, 2024 at 01:02:18PM +0800, Tiezhu Yang wrote: > 2. For x86 > > I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result > is same with LoongArch. Debian's clang-19 is 19.1.5. > (1) objdump info with LLVM release version 19.1.6: Please always use -r, that's ever so much more readable. > $ clang --version | head -1 > clang version 19.1.6 > > There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it > maybe jumps to a position and then steps to the return instruction, so > there is no "falls through" objtool warning. > > 0000000000000290 <dc_fixpt_recip>: > 290: f3 0f 1e fa endbr64 > 294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9> > 299: 48 85 ff test %rdi,%rdi > 29c: 0f 84 a8 00 00 00 je 34a <dc_fixpt_recip+0xba> > 2a2: 48 89 f9 mov %rdi,%rcx > 2a5: 48 f7 d9 neg %rcx > 2a8: 48 0f 48 cf cmovs %rdi,%rcx > 2ac: 53 push %rbx > 2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax > 2b4: 00 00 00 > 2b7: 31 d2 xor %edx,%edx > 2b9: 48 f7 f1 div %rcx > 2bc: be e0 ff ff ff mov $0xffffffe0,%esi > 2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e> > 2c3: 45 88 c8 mov %r9b,%r8b > 2c6: 4d 01 c0 add %r8,%r8 > 2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8 > 2cd: 48 29 da sub %rbx,%rdx > 2d0: 45 88 da mov %r11b,%r10b > 2d3: 4c 89 d0 mov %r10,%rax > 2d6: 4c 09 c0 or %r8,%rax > 2d9: 83 c6 02 add $0x2,%esi > 2dc: 74 31 je 30f <dc_fixpt_recip+0x7f> > 2de: 48 01 d2 add %rdx,%rdx > 2e1: 45 31 c0 xor %r8d,%r8d > 2e4: 41 ba 00 00 00 00 mov $0x0,%r10d > 2ea: 48 39 ca cmp %rcx,%rdx > 2ed: 41 0f 93 c1 setae %r9b > 2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66> > 2f3: 49 89 ca mov %rcx,%r10 > 2f6: 4c 29 d2 sub %r10,%rdx > 2f9: 48 01 d2 add %rdx,%rdx > 2fc: 45 31 d2 xor %r10d,%r10d > 2ff: 48 89 cb mov %rcx,%rbx > 302: 48 39 ca cmp %rcx,%rdx > 305: 41 0f 93 c3 setae %r11b > 309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33> > 30b: 31 db xor %ebx,%ebx > 30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33> > 30f: 48 01 d2 add %rdx,%rdx > 312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi > 319: ff ff 7f > 31c: 4c 8d 46 01 lea 0x1(%rsi),%r8 > 320: 48 39 ca cmp %rcx,%rdx > 323: 4c 0f 43 c6 cmovae %rsi,%r8 > 327: 4c 39 c0 cmp %r8,%rax > 32a: 77 29 ja 355 <dc_fixpt_recip+0xc5> > 32c: 48 39 ca cmp %rcx,%rdx > 32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax > 333: 48 89 c1 mov %rax,%rcx > 336: 48 f7 d9 neg %rcx > 339: 48 85 ff test %rdi,%rdi > 33c: 48 0f 49 c8 cmovns %rax,%rcx > 340: 48 89 c8 mov %rcx,%rax > 343: 5b pop %rbx > 344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba> > 34a: 0f 0b ud2 > 34c: 0f 0b ud2 > 34e: 31 c9 xor %ecx,%ecx > 350: e9 57 ff ff ff jmp 2ac <dc_fixpt_recip+0x1c> > 355: 0f 0b ud2 > 357: eb d3 jmp 32c <dc_fixpt_recip+0x9c> > 359: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) I had to puzzle a bit to get that double ud2 -- not all configs do that. Also, curse the DRM Makefiles, you can't do: make drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.s :-( They are the two ASSERT()s on line 217 and 54 respectively. They end up asserting the same value twice, so that makes sense. > $ clang --version | head -1 > clang version 20.0.0git (https://github.com/llvm/llvm-project.git > 8daf4f16fa08b5d876e98108721dd1743a360326) So I didn't have a recent build at hand.. so I've not validated the below. > There is "ud2" instruction at the end of dc_fixpt_recip(), its offset > is "350", this "ud2" instruction is not dead end due to the offset "352" > is in the relocation section '.rela.discard.reachable', that is to say, > dc_fixpt_recip() doesn't end with a return instruction or an > unconditional jump, so objtool determined that the function can fall > through into the next function, thus there is "falls through" objtool > warning. > > 0000000000000290 <dc_fixpt_recip>: > 290: f3 0f 1e fa endbr64 > 294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9> > 299: 48 85 ff test %rdi,%rdi > 29c: 0f 84 ac 00 00 00 je 34e <dc_fixpt_recip+0xbe> > 2a2: 53 push %rbx > 2a3: 48 89 f9 mov %rdi,%rcx > 2a6: 48 f7 d9 neg %rcx > 2a9: 48 0f 48 cf cmovs %rdi,%rcx > 2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax > 2b4: 00 00 00 > 2b7: 31 d2 xor %edx,%edx > 2b9: 48 f7 f1 div %rcx > 2bc: be e0 ff ff ff mov $0xffffffe0,%esi > 2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e> > 2c3: 45 88 c8 mov %r9b,%r8b > 2c6: 4d 01 c0 add %r8,%r8 > 2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8 > 2cd: 48 29 da sub %rbx,%rdx > 2d0: 45 88 da mov %r11b,%r10b > 2d3: 4c 89 d0 mov %r10,%rax > 2d6: 4c 09 c0 or %r8,%rax > 2d9: 83 c6 02 add $0x2,%esi > 2dc: 74 31 je 30f <dc_fixpt_recip+0x7f> > 2de: 48 01 d2 add %rdx,%rdx > 2e1: 45 31 c0 xor %r8d,%r8d > 2e4: 41 ba 00 00 00 00 mov $0x0,%r10d > 2ea: 48 39 ca cmp %rcx,%rdx > 2ed: 41 0f 93 c1 setae %r9b > 2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66> > 2f3: 49 89 ca mov %rcx,%r10 > 2f6: 4c 29 d2 sub %r10,%rdx > 2f9: 48 01 d2 add %rdx,%rdx > 2fc: 45 31 d2 xor %r10d,%r10d > 2ff: 48 89 cb mov %rcx,%rbx > 302: 48 39 ca cmp %rcx,%rdx > 305: 41 0f 93 c3 setae %r11b > 309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33> > 30b: 31 db xor %ebx,%ebx > 30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33> > 30f: 48 01 d2 add %rdx,%rdx > 312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi > 319: ff ff 7f > 31c: 4c 8d 46 01 lea 0x1(%rsi),%r8 > 320: 48 39 ca cmp %rcx,%rdx > 323: 4c 0f 43 c6 cmovae %rsi,%r8 > 327: 4c 39 c0 cmp %r8,%rax > 32a: 77 1e ja 34a <dc_fixpt_recip+0xba> > 32c: 48 39 ca cmp %rcx,%rdx > 32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax > 333: 48 89 c1 mov %rax,%rcx > 336: 48 f7 d9 neg %rcx > 339: 48 85 ff test %rdi,%rdi > 33c: 48 0f 49 c8 cmovns %rax,%rcx > 340: 48 89 c8 mov %rcx,%rax > 343: 5b pop %rbx > 344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba> > 34a: 0f 0b ud2 > 34c: eb de jmp 32c <dc_fixpt_recip+0x9c> > 34e: 0f 0b ud2 > 350: 0f 0b ud2 > 352: 66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw > 0x0(%rax,%rax,1) > 359: 1f 84 00 00 00 00 00 If you put them size-by-side, you'll see it's more or less the same code-gen (trivial differences), but now it just stops code-gen, where previously it would continue. So this really is a compiler problem, this needs no annotation, it's straight up broken. Now, the thing is, these ASSERT()s are checking for divide-by-zero, I suspect clang figured that out and invokes UB on us and just stops code-gen. Nathan, Nick, don't we have a compiler flag that forces __builtin_trap() whenever clang pulls something like this? I think UBSAN does this, but we really shouldn't pull in the whole of that for sanity.
On Fri, Dec 20, 2024 at 11:31:00AM +0100, Peter Zijlstra wrote:
> Also, curse the DRM Makefiles, you can't do:
>
> make drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.s
Small tip: You can get the path of the target by building
drivers/gpu/drm/amd/amdgpu/ and finding it in the output. In this case,
it'd be
$ make drivers/gpu/drm/amd/amdgpu/../display/dc/basics/fixpt31_32.s
Not excusing that it does not work as it should but sometimes you have
to work with what you can *shrug*
> > $ clang --version | head -1
> > clang version 20.0.0git (https://github.com/llvm/llvm-project.git
> > 8daf4f16fa08b5d876e98108721dd1743a360326)
>
> So I didn't have a recent build at hand.. so I've not validated the
> below.
...
> If you put them size-by-side, you'll see it's more or less the same
> code-gen (trivial differences), but now it just stops code-gen, where
> previously it would continue.
>
> So this really is a compiler problem, this needs no annotation, it's
> straight up broken.
>
> Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
> suspect clang figured that out and invokes UB on us and just stops
> code-gen.
Yeah, I think your analysis is spot on, as this was introduced by a
change in clang from a few months ago according to my bisect:
https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448
Since the ASSERT does not do anything to prevent the divide by zero (it
just flags it with WARN_ON) and the rest of the code doesn't either, I
assume that the codegen stops as soon as it encounters the unreachable
that change created from the path where divide by zero would occur via
dc_fixpt_recip() ->
dc_fixpt_from_fraction() ->
complete_integer_division_u64() ->
div64_u64_rem()
Shouldn't callers of division functions harden them against dividing by
zero?
> Nathan, Nick, don't we have a compiler flag that forces __builtin_trap()
> whenever clang pulls something like this? I think UBSAN does this, but
> we really shouldn't pull in the whole of that for sanity.
Right, I think that LLVM has a hidden flag for this:
-mllvm -trap-unreachable
That makes this particular warning disappear.
It isn't the greatest because '-mllvm' flags need to be passed along to
the linker for LTO but that's easy enough to deal with. I know we have
talked about enabling that flag in the past but I cannot remember why we
decided against it (maybe code size concerns and other optimization
restrictions)? It looks like GCC has a similar flag,
-funreachable-traps.
Cheers,
Nathan
On Fri, 2024-12-20 at 15:34 -0700, Nathan Chancellor wrote: > > Now, the thing is, these ASSERT()s are checking for divide-by-zero, I > > suspect clang figured that out and invokes UB on us and just stops > > code-gen. > > Yeah, I think your analysis is spot on, as this was introduced by a > change in clang from a few months ago according to my bisect: > > https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448 > > Since the ASSERT does not do anything to prevent the divide by zero (it > just flags it with WARN_ON) and the rest of the code doesn't either, I > assume that the codegen stops as soon as it encounters the unreachable > that change created from the path where divide by zero would occur via > > dc_fixpt_recip() -> > dc_fixpt_from_fraction() -> > complete_integer_division_u64() -> > div64_u64_rem() > > Shouldn't callers of division functions harden them against dividing by > zero? Yes I think it'd be the correct solution. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University
On 12/21/2024 03:40 PM, Xi Ruoyao wrote:
> On Fri, 2024-12-20 at 15:34 -0700, Nathan Chancellor wrote:
>>> Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
>>> suspect clang figured that out and invokes UB on us and just stops
>>> code-gen.
>>
>> Yeah, I think your analysis is spot on, as this was introduced by a
>> change in clang from a few months ago according to my bisect:
>>
>> https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448
>>
>> Since the ASSERT does not do anything to prevent the divide by zero (it
>> just flags it with WARN_ON) and the rest of the code doesn't either, I
>> assume that the codegen stops as soon as it encounters the unreachable
>> that change created from the path where divide by zero would occur via
>>
>> dc_fixpt_recip() ->
>> dc_fixpt_from_fraction() ->
>> complete_integer_division_u64() ->
>> div64_u64_rem()
>>
>> Shouldn't callers of division functions harden them against dividing by
>> zero?
>
> Yes I think it'd be the correct solution.
Thank you all. Do you mean like this?
--- >8 ---
diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
index 88d3f9d7dd55..848d8e67304a 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
@@ -79,11 +79,13 @@ struct fixed31_32 dc_fixpt_from_fraction(long long
numerator, long long denomina
unsigned long long arg2_value = arg2_negative ? -denominator :
denominator;
unsigned long long remainder;
+ unsigned long long res_value;
/* determine integer part */
- unsigned long long res_value = complete_integer_division_u64(
- arg1_value, arg2_value, &remainder);
+ ASSERT(arg2_value);
+
+ res_value = complete_integer_division_u64(arg1_value,
arg2_value, &remainder);
ASSERT(res_value <= LONG_MAX);
@@ -214,8 +216,6 @@ struct fixed31_32 dc_fixpt_recip(struct fixed31_32 arg)
* Good idea to use Newton's method
*/
- ASSERT(arg.value);
-
return dc_fixpt_from_fraction(
dc_fixpt_one.value,
arg.value);
With the above changes, there is no "falls through" objtool warning
compiled with both clang 19 and the latest mainline clang 20.
If you are OK with it, I will send a separate formal patch to handle
this issue after doing some more testing.
Thanks,
Tiezhu
On Sun, Dec 22, 2024 at 12:27:47PM +0800, Tiezhu Yang wrote: > On 12/21/2024 03:40 PM, Xi Ruoyao wrote: > > On Fri, 2024-12-20 at 15:34 -0700, Nathan Chancellor wrote: > > > > Now, the thing is, these ASSERT()s are checking for divide-by-zero, I > > > > suspect clang figured that out and invokes UB on us and just stops > > > > code-gen. > > > > > > Yeah, I think your analysis is spot on, as this was introduced by a > > > change in clang from a few months ago according to my bisect: > > > > > > https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448 > > > > > > Since the ASSERT does not do anything to prevent the divide by zero (it > > > just flags it with WARN_ON) and the rest of the code doesn't either, I > > > assume that the codegen stops as soon as it encounters the unreachable > > > that change created from the path where divide by zero would occur via > > > > > > dc_fixpt_recip() -> > > > dc_fixpt_from_fraction() -> > > > complete_integer_division_u64() -> > > > div64_u64_rem() > > > > > > Shouldn't callers of division functions harden them against dividing by > > > zero? > > > > Yes I think it'd be the correct solution. > > Thank you all. Do you mean like this? > > --- >8 --- > > diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c > b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c > index 88d3f9d7dd55..848d8e67304a 100644 > --- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c > +++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c > @@ -79,11 +79,13 @@ struct fixed31_32 dc_fixpt_from_fraction(long long > numerator, long long denomina > unsigned long long arg2_value = arg2_negative ? -denominator : > denominator; > > unsigned long long remainder; > + unsigned long long res_value; > > /* determine integer part */ > > - unsigned long long res_value = complete_integer_division_u64( > - arg1_value, arg2_value, &remainder); > + ASSERT(arg2_value); > + > + res_value = complete_integer_division_u64(arg1_value, arg2_value, > &remainder); > > ASSERT(res_value <= LONG_MAX); > > @@ -214,8 +216,6 @@ struct fixed31_32 dc_fixpt_recip(struct fixed31_32 arg) > * Good idea to use Newton's method > */ > > - ASSERT(arg.value); > - > return dc_fixpt_from_fraction( > dc_fixpt_one.value, > arg.value); > > With the above changes, there is no "falls through" objtool warning > compiled with both clang 19 and the latest mainline clang 20. I am somewhat surprised that changes anything because the ASSERT is not stopping control flow so I would expect the same problem as before. I guess it does not happen perhaps due to inlining differences? I looked at this code briefly when I sent my initial message and I was not sure where such a check should exist. It does not look like these functions really do any sort of error handling. > If you are OK with it, I will send a separate formal patch to handle > this issue after doing some more testing. It may still be worth doing this to get some initial thoughts from the AMD DRM folks. Cheers, Nathan
On 12/24/2024 05:46 AM, Nathan Chancellor wrote:
> On Sun, Dec 22, 2024 at 12:27:47PM +0800, Tiezhu Yang wrote:
...
>> With the above changes, there is no "falls through" objtool warning
>> compiled with both clang 19 and the latest mainline clang 20.
>
> I am somewhat surprised that changes anything because the ASSERT is not
> stopping control flow so I would expect the same problem as before. I
> guess it does not happen perhaps due to inlining differences? I looked
It is weird and I think it is not the correct way.
> at this code briefly when I sent my initial message and I was not sure
> where such a check should exist. It does not look like these functions
> really do any sort of error handling.
>
>> If you are OK with it, I will send a separate formal patch to handle
>> this issue after doing some more testing.
>
> It may still be worth doing this to get some initial thoughts from the
> AMD DRM folks.
I think the correct way is:
Keep the current ASSERT for the aim of debugging, just add BUG() to
stop control flow if the divisor is zero.
--- >8 ---
diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
index 88d3f9d7dd55..e15391e36b40 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
@@ -52,6 +52,7 @@ static inline unsigned long long
complete_integer_division_u64(
unsigned long long result;
ASSERT(divisor);
+ BUG_ON(!divisor);
result = div64_u64_rem(dividend, divisor, remainder);
diff --git a/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
b/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
index 131f1e3949d3..ce2036950808 100644
--- a/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
@@ -30,6 +30,7 @@ static inline unsigned long long
spl_complete_integer_division_u64(
unsigned long long result;
SPL_ASSERT(divisor);
+ BUG_ON(!divisor);
result = spl_div64_u64_rem(dividend, divisor, remainder);
It looks reasonable and works well both on x86 and LoongArch, there are
no the following objtool warnings:
dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
spl_fixpt_recip() falls through to next function spl_fixpt_sinc()
If no more comments, I will send a separate formal patch for your
review in the next week.
Thanks,
Tiezhu
© 2016 - 2025 Red Hat, Inc.