[PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline

Tiezhu Yang posted 9 patches 1 year ago
There is a newer version of this series
[PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Tiezhu Yang 1 year ago
When compiling with Clang on LoongArch, there exists the following objtool
warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:

  dc_fixpt_recip() falls through to next function dc_fixpt_sinc()

This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
by Clang, given dc_fixpt_from_fraction() is not a simple function, just
mark it noinline to avoid the above issue.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
index 88d3f9d7dd55..b40c6a21460d 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
@@ -68,7 +68,7 @@ static inline unsigned long long complete_integer_division_u64(
 #define GET_FRACTIONAL_PART(x) \
 	(FRACTIONAL_PART_MASK & (x))
 
-struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
+noinline struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
 {
 	struct fixed31_32 res;
 
-- 
2.42.0
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Huacai Chen 12 months ago
Hi, Tiezhu,

On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> When compiling with Clang on LoongArch, there exists the following objtool
> warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
>
>   dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
>
> This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
> by Clang, given dc_fixpt_from_fraction() is not a simple function, just
> mark it noinline to avoid the above issue.
I don't know whether drm maintainers can accept this, because it looks
like a workaround. Yes, uninline this function "solve" a problem and
seems reasonable in this case because the function is "not simple",
but from another point of view, you may hide a type of bug.


Huacai

>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> index 88d3f9d7dd55..b40c6a21460d 100644
> --- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> +++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> @@ -68,7 +68,7 @@ static inline unsigned long long complete_integer_division_u64(
>  #define GET_FRACTIONAL_PART(x) \
>         (FRACTIONAL_PART_MASK & (x))
>
> -struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
> +noinline struct fixed31_32 dc_fixpt_from_fraction(long long numerator, long long denominator)
>  {
>         struct fixed31_32 res;
>
> --
> 2.42.0
>
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Josh Poimboeuf 12 months ago
On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote:
> Hi, Tiezhu,
> 
> On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
> >
> > When compiling with Clang on LoongArch, there exists the following objtool
> > warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
> >
> >   dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
> >
> > This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
> > by Clang, given dc_fixpt_from_fraction() is not a simple function, just
> > mark it noinline to avoid the above issue.
> I don't know whether drm maintainers can accept this, because it looks
> like a workaround. Yes, uninline this function "solve" a problem and
> seems reasonable in this case because the function is "not simple",
> but from another point of view, you may hide a type of bug.

Agreed, it sounds like there's definitely a bug which this patch is
papering over.

-- 
Josh
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Alex Deucher 12 months ago
On Wed, Dec 18, 2024 at 2:18 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote:
> > Hi, Tiezhu,
> >
> > On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
> > >
> > > When compiling with Clang on LoongArch, there exists the following objtool
> > > warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
> > >
> > >   dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
> > >
> > > This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
> > > by Clang, given dc_fixpt_from_fraction() is not a simple function, just
> > > mark it noinline to avoid the above issue.
> > I don't know whether drm maintainers can accept this, because it looks
> > like a workaround. Yes, uninline this function "solve" a problem and
> > seems reasonable in this case because the function is "not simple",
> > but from another point of view, you may hide a type of bug.
>
> Agreed, it sounds like there's definitely a bug which this patch is
> papering over.

Yes, agreed.

Alex

>
> --
> Josh
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Tiezhu Yang 12 months ago
On 12/19/2024 03:22 AM, Alex Deucher wrote:
> On Wed, Dec 18, 2024 at 2:18 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>>
>> On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote:
>>> Hi, Tiezhu,
>>>
>>> On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>>>>
>>>> When compiling with Clang on LoongArch, there exists the following objtool
>>>> warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
>>>>
>>>>   dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
>>>>
>>>> This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
>>>> by Clang, given dc_fixpt_from_fraction() is not a simple function, just
>>>> mark it noinline to avoid the above issue.
>>> I don't know whether drm maintainers can accept this, because it looks
>>> like a workaround. Yes, uninline this function "solve" a problem and
>>> seems reasonable in this case because the function is "not simple",
>>> but from another point of view, you may hide a type of bug.
>>
>> Agreed, it sounds like there's definitely a bug which this patch is
>> papering over.
>
> Yes, agreed.

Additional Info:

In order to avoid the effect of this series, I tested with kernel
6.13-rc3 without this series again.

--------------------------------------------------------------------------
Here are the test result.

1. For LoongArch

"dc_fixpt_recip() falls through to next function dc_fixpt_sinc()"
objtool warning only appears compiled with the latest mainline LLVM,
there is no this issue compiled with the latest release version such
as LLVM 19.1.6.

(1) objdump info with LLVM release version 19.1.6:

$ clang --version | head -1
clang version 19.1.6

There is an jump instruction "b" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.

0000000000000200 <dc_fixpt_recip>:
      200:       40009480        beqz            $a0, 148        # 294 
<dc_fixpt_recip+0x94>
      204:       0049fc85        srai.d          $a1, $a0, 0x3f
      208:       00159486        xor             $a2, $a0, $a1
      20c:       001194c5        sub.d           $a1, $a2, $a1
      210:       03800007        ori             $a3, $zero, 0x0
      214:       16000027        lu32i.d         $a3, 1
      218:       002314e6        div.du          $a2, $a3, $a1
      21c:       001d94c8        mul.d           $a4, $a2, $a1
      220:       0011a0e9        sub.d           $a5, $a3, $a4
      224:       02bf8008        addi.w          $a4, $zero, -32
      228:       03400000        andi            $zero, $zero, 0x0
      22c:       03400000        andi            $zero, $zero, 0x0
      230:       00410529        slli.d          $a5, $a5, 0x1
      234:       004104c6        slli.d          $a2, $a2, 0x1
      238:       0012952a        sltu            $a6, $a5, $a1
      23c:       03c0054a        xori            $a6, $a6, 0x1
      240:       001328ab        maskeqz         $a7, $a1, $a6
      244:       0011ad29        sub.d           $a5, $a5, $a7
      248:       00df0108        bstrpick.d      $a4, $a4, 0x1f, 0x0
      24c:       02c00508        addi.d          $a4, $a4, 1
      250:       00149d0b        and             $a7, $a4, $a3
      254:       001528c6        or              $a2, $a2, $a6
      258:       43ffd97f        beqz            $a7, -40        # 230 
<dc_fixpt_recip+0x30>
      25c:       00410527        slli.d          $a3, $a5, 0x1
      260:       001294e5        sltu            $a1, $a3, $a1
      264:       03c004a5        xori            $a1, $a1, 0x1
      268:       02bffc07        addi.w          $a3, $zero, -1
      26c:       031ffce7        lu52i.d         $a3, $a3, 2047
      270:       00159ca7        xor             $a3, $a1, $a3
      274:       680030e6        bltu            $a3, $a2, 48    # 2a4 
<dc_fixpt_recip+0xa4>
      278:       001094c5        add.d           $a1, $a2, $a1
      27c:       00119406        sub.d           $a2, $zero, $a1
      280:       02000084        slti            $a0, $a0, 0
      284:       001390a5        masknez         $a1, $a1, $a0
      288:       001310c4        maskeqz         $a0, $a2, $a0
      28c:       00151484        or              $a0, $a0, $a1
      290:       4c000020        jirl            $zero, $ra, 0
      294:       00150005        or              $a1, $zero, $zero
      298:       002a0001        break           0x1
      29c:       002a0001        break           0x1
      2a0:       53ff73ff        b               -144    # 210 
<dc_fixpt_recip+0x10>
      2a4:       002a0001        break           0x1
      2a8:       53ffd3ff        b               -48     # 278 
<dc_fixpt_recip+0x78>
      2ac:       03400000        andi            $zero, $zero, 0x0
      2b0:       03400000        andi            $zero, $zero, 0x0
      2b4:       03400000        andi            $zero, $zero, 0x0
      2b8:       03400000        andi            $zero, $zero, 0x0
      2bc:       03400000        andi            $zero, $zero, 0x0

00000000000002c0 <dc_fixpt_sinc>:
      2c0:       0049fc85        srai.d          $a1, $a0, 0x3f
      2c4:       00159486        xor             $a2, $a0, $a1
      2c8:       1490fda7        lu12i.w         $a3, 296941
      2cc:       039444e7        ori             $a3, $a3, 0x511
      2d0:       001194c6        sub.d           $a2, $a2, $a1
      2d4:       001500e5        or              $a1, $a3, $zero
      2d8:       160000c5        lu32i.d         $a1, 6
      2dc:       001500c9        or              $a5, $a2, $zero
      2e0:       00150088        or              $a4, $a0, $zero
      2e4:       6000a8c5        blt             $a2, $a1, 168   # 38c 
<dc_fixpt_sinc+0xcc>

(2) objdump info with LLVM mainline version 20.0.0git:

$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 
8daf4f16fa08b5d876e98108721dd1743a360326)

There is "break" instruction at the end of dc_fixpt_recip(), its offset
is "2a0", this "break" instruction is not dead end due to the offset
"2a4" is in the relocation section '.rela.discard.reachable', that is
to say, dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.

0000000000000200 <dc_fixpt_recip>:
      200:       40009c80        beqz            $a0, 156        # 29c 
<dc_fixpt_recip+0x9c>
      204:       0049fc85        srai.d          $a1, $a0, 0x3f
      208:       00159486        xor             $a2, $a0, $a1
      20c:       001194c5        sub.d           $a1, $a2, $a1
      210:       03800007        ori             $a3, $zero, 0x0
      214:       16000027        lu32i.d         $a3, 1
      218:       002314e6        div.du          $a2, $a3, $a1
      21c:       001d94c8        mul.d           $a4, $a2, $a1
      220:       0011a0e9        sub.d           $a5, $a3, $a4
      224:       02bf8008        addi.w          $a4, $zero, -32
      228:       03400000        andi            $zero, $zero, 0x0
      22c:       03400000        andi            $zero, $zero, 0x0
      230:       00410529        slli.d          $a5, $a5, 0x1
      234:       004104c6        slli.d          $a2, $a2, 0x1
      238:       0012952a        sltu            $a6, $a5, $a1
      23c:       03c0054a        xori            $a6, $a6, 0x1
      240:       001328ab        maskeqz         $a7, $a1, $a6
      244:       0011ad29        sub.d           $a5, $a5, $a7
      248:       00df0108        bstrpick.d      $a4, $a4, 0x1f, 0x0
      24c:       02c00508        addi.d          $a4, $a4, 1
      250:       00149d0b        and             $a7, $a4, $a3
      254:       001528c6        or              $a2, $a2, $a6
      258:       43ffd97f        beqz            $a7, -40        # 230 
<dc_fixpt_recip+0x30>
      25c:       00410527        slli.d          $a3, $a5, 0x1
      260:       001294e5        sltu            $a1, $a3, $a1
      264:       03c004a5        xori            $a1, $a1, 0x1
      268:       02bffc07        addi.w          $a3, $zero, -1
      26c:       031ffce7        lu52i.d         $a3, $a3, 2047
      270:       00159ca7        xor             $a3, $a1, $a3
      274:       680020e6        bltu            $a3, $a2, 32    # 294 
<dc_fixpt_recip+0x94>
      278:       001094c5        add.d           $a1, $a2, $a1
      27c:       00119406        sub.d           $a2, $zero, $a1
      280:       02000084        slti            $a0, $a0, 0
      284:       001390a5        masknez         $a1, $a1, $a0
      288:       001310c4        maskeqz         $a0, $a2, $a0
      28c:       00151484        or              $a0, $a0, $a1
      290:       4c000020        jirl            $zero, $ra, 0
      294:       002a0001        break           0x1
      298:       53ffe3ff        b               -32     # 278 
<dc_fixpt_recip+0x78>
      29c:       002a0001        break           0x1
      2a0:       002a0001        break           0x1
      2a4:       03400000        andi            $zero, $zero, 0x0
      2a8:       03400000        andi            $zero, $zero, 0x0
      2ac:       03400000        andi            $zero, $zero, 0x0
      2b0:       03400000        andi            $zero, $zero, 0x0
      2b4:       03400000        andi            $zero, $zero, 0x0
      2b8:       03400000        andi            $zero, $zero, 0x0
      2bc:       03400000        andi            $zero, $zero, 0x0

00000000000002c0 <dc_fixpt_sinc>:
      2c0:       0049fc85        srai.d          $a1, $a0, 0x3f
      2c4:       00159486        xor             $a2, $a0, $a1
      2c8:       001194c6        sub.d           $a2, $a2, $a1
      2cc:       1490fda5        lu12i.w         $a1, 296941
      2d0:       039444a5        ori             $a1, $a1, 0x511
      2d4:       160000c5        lu32i.d         $a1, 6
      2d8:       001500c7        or              $a3, $a2, $zero
      2dc:       00150088        or              $a4, $a0, $zero
      2e0:       600090c5        blt             $a2, $a1, 144   # 370 
<dc_fixpt_sinc+0xb0>

2. For x86

I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result
is same with LoongArch.

(1) objdump info with LLVM release version 19.1.6:

$ clang --version | head -1
clang version 19.1.6

There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.

0000000000000290 <dc_fixpt_recip>:
      290:       f3 0f 1e fa             endbr64
      294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
      299:       48 85 ff                test   %rdi,%rdi
      29c:       0f 84 a8 00 00 00       je     34a <dc_fixpt_recip+0xba>
      2a2:       48 89 f9                mov    %rdi,%rcx
      2a5:       48 f7 d9                neg    %rcx
      2a8:       48 0f 48 cf             cmovs  %rdi,%rcx
      2ac:       53                      push   %rbx
      2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
      2b4:       00 00 00
      2b7:       31 d2                   xor    %edx,%edx
      2b9:       48 f7 f1                div    %rcx
      2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
      2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
      2c3:       45 88 c8                mov    %r9b,%r8b
      2c6:       4d 01 c0                add    %r8,%r8
      2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
      2cd:       48 29 da                sub    %rbx,%rdx
      2d0:       45 88 da                mov    %r11b,%r10b
      2d3:       4c 89 d0                mov    %r10,%rax
      2d6:       4c 09 c0                or     %r8,%rax
      2d9:       83 c6 02                add    $0x2,%esi
      2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
      2de:       48 01 d2                add    %rdx,%rdx
      2e1:       45 31 c0                xor    %r8d,%r8d
      2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
      2ea:       48 39 ca                cmp    %rcx,%rdx
      2ed:       41 0f 93 c1             setae  %r9b
      2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
      2f3:       49 89 ca                mov    %rcx,%r10
      2f6:       4c 29 d2                sub    %r10,%rdx
      2f9:       48 01 d2                add    %rdx,%rdx
      2fc:       45 31 d2                xor    %r10d,%r10d
      2ff:       48 89 cb                mov    %rcx,%rbx
      302:       48 39 ca                cmp    %rcx,%rdx
      305:       41 0f 93 c3             setae  %r11b
      309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
      30b:       31 db                   xor    %ebx,%ebx
      30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
      30f:       48 01 d2                add    %rdx,%rdx
      312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
      319:       ff ff 7f
      31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
      320:       48 39 ca                cmp    %rcx,%rdx
      323:       4c 0f 43 c6             cmovae %rsi,%r8
      327:       4c 39 c0                cmp    %r8,%rax
      32a:       77 29                   ja     355 <dc_fixpt_recip+0xc5>
      32c:       48 39 ca                cmp    %rcx,%rdx
      32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
      333:       48 89 c1                mov    %rax,%rcx
      336:       48 f7 d9                neg    %rcx
      339:       48 85 ff                test   %rdi,%rdi
      33c:       48 0f 49 c8             cmovns %rax,%rcx
      340:       48 89 c8                mov    %rcx,%rax
      343:       5b                      pop    %rbx
      344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
      34a:       0f 0b                   ud2
      34c:       0f 0b                   ud2
      34e:       31 c9                   xor    %ecx,%ecx
      350:       e9 57 ff ff ff          jmp    2ac <dc_fixpt_recip+0x1c>
      355:       0f 0b                   ud2
      357:       eb d3                   jmp    32c <dc_fixpt_recip+0x9c>
      359:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000000360 <.Ltmp40>:
      360:       90                      nop
      361:       90                      nop
      362:       90                      nop
      363:       90                      nop
      364:       90                      nop
      365:       90                      nop
      366:       90                      nop
      367:       90                      nop
      368:       90                      nop
      369:       90                      nop
      36a:       90                      nop
      36b:       90                      nop
      36c:       90                      nop
      36d:       90                      nop
      36e:       90                      nop
      36f:       90                      nop

0000000000000370 <dc_fixpt_sinc>:
      370:       f3 0f 1e fa             endbr64
      374:       e8 00 00 00 00          call   379 <dc_fixpt_sinc+0x9>

(2) objdump info with LLVM mainline version 20.0.0git:

$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 
8daf4f16fa08b5d876e98108721dd1743a360326)

There is "ud2" instruction at the end of dc_fixpt_recip(), its offset
is "350", this "ud2" instruction is not dead end due to the offset "352"
is in the relocation section '.rela.discard.reachable', that is to say,
dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.

0000000000000290 <dc_fixpt_recip>:
      290:       f3 0f 1e fa             endbr64
      294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
      299:       48 85 ff                test   %rdi,%rdi
      29c:       0f 84 ac 00 00 00       je     34e <dc_fixpt_recip+0xbe>
      2a2:       53                      push   %rbx
      2a3:       48 89 f9                mov    %rdi,%rcx
      2a6:       48 f7 d9                neg    %rcx
      2a9:       48 0f 48 cf             cmovs  %rdi,%rcx
      2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
      2b4:       00 00 00
      2b7:       31 d2                   xor    %edx,%edx
      2b9:       48 f7 f1                div    %rcx
      2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
      2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
      2c3:       45 88 c8                mov    %r9b,%r8b
      2c6:       4d 01 c0                add    %r8,%r8
      2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
      2cd:       48 29 da                sub    %rbx,%rdx
      2d0:       45 88 da                mov    %r11b,%r10b
      2d3:       4c 89 d0                mov    %r10,%rax
      2d6:       4c 09 c0                or     %r8,%rax
      2d9:       83 c6 02                add    $0x2,%esi
      2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
      2de:       48 01 d2                add    %rdx,%rdx
      2e1:       45 31 c0                xor    %r8d,%r8d
      2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
      2ea:       48 39 ca                cmp    %rcx,%rdx
      2ed:       41 0f 93 c1             setae  %r9b
      2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
      2f3:       49 89 ca                mov    %rcx,%r10
      2f6:       4c 29 d2                sub    %r10,%rdx
      2f9:       48 01 d2                add    %rdx,%rdx
      2fc:       45 31 d2                xor    %r10d,%r10d
      2ff:       48 89 cb                mov    %rcx,%rbx
      302:       48 39 ca                cmp    %rcx,%rdx
      305:       41 0f 93 c3             setae  %r11b
      309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
      30b:       31 db                   xor    %ebx,%ebx
      30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
      30f:       48 01 d2                add    %rdx,%rdx
      312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
      319:       ff ff 7f
      31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
      320:       48 39 ca                cmp    %rcx,%rdx
      323:       4c 0f 43 c6             cmovae %rsi,%r8
      327:       4c 39 c0                cmp    %r8,%rax
      32a:       77 1e                   ja     34a <dc_fixpt_recip+0xba>
      32c:       48 39 ca                cmp    %rcx,%rdx
      32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
      333:       48 89 c1                mov    %rax,%rcx
      336:       48 f7 d9                neg    %rcx
      339:       48 85 ff                test   %rdi,%rdi
      33c:       48 0f 49 c8             cmovns %rax,%rcx
      340:       48 89 c8                mov    %rcx,%rax
      343:       5b                      pop    %rbx
      344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
      34a:       0f 0b                   ud2
      34c:       eb de                   jmp    32c <dc_fixpt_recip+0x9c>
      34e:       0f 0b                   ud2
      350:       0f 0b                   ud2
      352:       66 66 66 66 66 2e 0f    data16 data16 data16 data16 cs 
nopw 0x0(%rax,%rax,1)
      359:       1f 84 00 00 00 00 00

0000000000000360 <.Ltmp40>:
      360:       90                      nop
      361:       90                      nop
      362:       90                      nop
      363:       90                      nop
      364:       90                      nop
      365:       90                      nop
      366:       90                      nop
      367:       90                      nop
      368:       90                      nop
      369:       90                      nop
      36a:       90                      nop
      36b:       90                      nop
      36c:       90                      nop
      36d:       90                      nop
      36e:       90                      nop
      36f:       90                      nop

0000000000000370 <dc_fixpt_sinc>:
      370:       f3 0f 1e fa             endbr64
      374:       e8 00 00 00 00          call   379 <dc_fixpt_sinc+0x9>
--------------------------------------------------------------------------

In my opinion, if there is a bug, then it is a generic rather than
arch-specific bug.

The root cause is related with LLVM or Linux kernel? I am not sure
because objtool only checks the object info and reports the warning
"falls through" if the check conditions are true.

tools/objtool/check.c
static int validate_branch(struct objtool_file *file, struct symbol *func,
                            struct instruction *insn, struct insn_state 
state)
{
         struct alternative *alt;
         struct instruction *next_insn, *prev_insn = NULL;
         struct section *sec;
         u8 visited;
         int ret;

         sec = insn->sec;

         while (1) {
                 next_insn = next_insn_to_validate(file, insn);

                 if (func && insn_func(insn) && func != 
insn_func(insn)->pfunc) {
                         /* Ignore KCFI type preambles, which always 
fall through */
                         if (!strncmp(func->name, "__cfi_", 6) ||
                             !strncmp(func->name, "__pfx_", 6))
                                 return 0;

                         WARN("%s() falls through to next function %s()",
                              func->name, insn_func(insn)->name);
                         return 1;
                 }
         ...
}

How to silence the warning when compiling with the latest mainline
LLVM with a proper way? Modify LLVM code or tools/objtool/check.c
or drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c?

Anyway, I think this patch can be independent of this series and can
be sent separately after the "real bug" is fixed, please ignore it now.

Thanks,
Tiezhu

Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Peter Zijlstra 12 months ago
On Fri, Dec 20, 2024 at 01:02:18PM +0800, Tiezhu Yang wrote:

> 2. For x86
> 
> I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result
> is same with LoongArch.

Debian's clang-19 is 19.1.5.

> (1) objdump info with LLVM release version 19.1.6:

Please always use -r, that's ever so much more readable.

> $ clang --version | head -1
> clang version 19.1.6
> 
> There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it
> maybe jumps to a position and then steps to the return instruction, so
> there is no "falls through" objtool warning.
> 
> 0000000000000290 <dc_fixpt_recip>:
>      290:       f3 0f 1e fa             endbr64
>      294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
>      299:       48 85 ff                test   %rdi,%rdi
>      29c:       0f 84 a8 00 00 00       je     34a <dc_fixpt_recip+0xba>
>      2a2:       48 89 f9                mov    %rdi,%rcx
>      2a5:       48 f7 d9                neg    %rcx
>      2a8:       48 0f 48 cf             cmovs  %rdi,%rcx
>      2ac:       53                      push   %rbx
>      2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
>      2b4:       00 00 00
>      2b7:       31 d2                   xor    %edx,%edx
>      2b9:       48 f7 f1                div    %rcx
>      2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
>      2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
>      2c3:       45 88 c8                mov    %r9b,%r8b
>      2c6:       4d 01 c0                add    %r8,%r8
>      2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
>      2cd:       48 29 da                sub    %rbx,%rdx
>      2d0:       45 88 da                mov    %r11b,%r10b
>      2d3:       4c 89 d0                mov    %r10,%rax
>      2d6:       4c 09 c0                or     %r8,%rax
>      2d9:       83 c6 02                add    $0x2,%esi
>      2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
>      2de:       48 01 d2                add    %rdx,%rdx
>      2e1:       45 31 c0                xor    %r8d,%r8d
>      2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
>      2ea:       48 39 ca                cmp    %rcx,%rdx
>      2ed:       41 0f 93 c1             setae  %r9b
>      2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
>      2f3:       49 89 ca                mov    %rcx,%r10
>      2f6:       4c 29 d2                sub    %r10,%rdx
>      2f9:       48 01 d2                add    %rdx,%rdx
>      2fc:       45 31 d2                xor    %r10d,%r10d
>      2ff:       48 89 cb                mov    %rcx,%rbx
>      302:       48 39 ca                cmp    %rcx,%rdx
>      305:       41 0f 93 c3             setae  %r11b
>      309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
>      30b:       31 db                   xor    %ebx,%ebx
>      30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
>      30f:       48 01 d2                add    %rdx,%rdx
>      312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
>      319:       ff ff 7f
>      31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
>      320:       48 39 ca                cmp    %rcx,%rdx
>      323:       4c 0f 43 c6             cmovae %rsi,%r8
>      327:       4c 39 c0                cmp    %r8,%rax
>      32a:       77 29                   ja     355 <dc_fixpt_recip+0xc5>
>      32c:       48 39 ca                cmp    %rcx,%rdx
>      32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
>      333:       48 89 c1                mov    %rax,%rcx
>      336:       48 f7 d9                neg    %rcx
>      339:       48 85 ff                test   %rdi,%rdi
>      33c:       48 0f 49 c8             cmovns %rax,%rcx
>      340:       48 89 c8                mov    %rcx,%rax
>      343:       5b                      pop    %rbx
>      344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
>      34a:       0f 0b                   ud2
>      34c:       0f 0b                   ud2
>      34e:       31 c9                   xor    %ecx,%ecx
>      350:       e9 57 ff ff ff          jmp    2ac <dc_fixpt_recip+0x1c>
>      355:       0f 0b                   ud2
>      357:       eb d3                   jmp    32c <dc_fixpt_recip+0x9c>
>      359:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

I had to puzzle a bit to get that double ud2 -- not all configs do that.

Also, curse the DRM Makefiles, you can't do:

  make drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.s

:-(

They are the two ASSERT()s on line 217 and 54 respectively. They end up
asserting the same value twice, so that makes sense.

> $ clang --version | head -1
> clang version 20.0.0git (https://github.com/llvm/llvm-project.git
> 8daf4f16fa08b5d876e98108721dd1743a360326)

So I didn't have a recent build at hand.. so I've not validated the
below.

> There is "ud2" instruction at the end of dc_fixpt_recip(), its offset
> is "350", this "ud2" instruction is not dead end due to the offset "352"
> is in the relocation section '.rela.discard.reachable', that is to say,
> dc_fixpt_recip() doesn't end with a return instruction or an
> unconditional jump, so objtool determined that the function can fall
> through into the next function, thus there is "falls through" objtool
> warning.
> 
> 0000000000000290 <dc_fixpt_recip>:
>      290:       f3 0f 1e fa             endbr64
>      294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
>      299:       48 85 ff                test   %rdi,%rdi
>      29c:       0f 84 ac 00 00 00       je     34e <dc_fixpt_recip+0xbe>
>      2a2:       53                      push   %rbx
>      2a3:       48 89 f9                mov    %rdi,%rcx
>      2a6:       48 f7 d9                neg    %rcx
>      2a9:       48 0f 48 cf             cmovs  %rdi,%rcx
>      2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
>      2b4:       00 00 00
>      2b7:       31 d2                   xor    %edx,%edx
>      2b9:       48 f7 f1                div    %rcx
>      2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
>      2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
>      2c3:       45 88 c8                mov    %r9b,%r8b
>      2c6:       4d 01 c0                add    %r8,%r8
>      2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
>      2cd:       48 29 da                sub    %rbx,%rdx
>      2d0:       45 88 da                mov    %r11b,%r10b
>      2d3:       4c 89 d0                mov    %r10,%rax
>      2d6:       4c 09 c0                or     %r8,%rax
>      2d9:       83 c6 02                add    $0x2,%esi
>      2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
>      2de:       48 01 d2                add    %rdx,%rdx
>      2e1:       45 31 c0                xor    %r8d,%r8d
>      2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
>      2ea:       48 39 ca                cmp    %rcx,%rdx
>      2ed:       41 0f 93 c1             setae  %r9b
>      2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
>      2f3:       49 89 ca                mov    %rcx,%r10
>      2f6:       4c 29 d2                sub    %r10,%rdx
>      2f9:       48 01 d2                add    %rdx,%rdx
>      2fc:       45 31 d2                xor    %r10d,%r10d
>      2ff:       48 89 cb                mov    %rcx,%rbx
>      302:       48 39 ca                cmp    %rcx,%rdx
>      305:       41 0f 93 c3             setae  %r11b
>      309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
>      30b:       31 db                   xor    %ebx,%ebx
>      30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
>      30f:       48 01 d2                add    %rdx,%rdx
>      312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
>      319:       ff ff 7f
>      31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
>      320:       48 39 ca                cmp    %rcx,%rdx
>      323:       4c 0f 43 c6             cmovae %rsi,%r8
>      327:       4c 39 c0                cmp    %r8,%rax
>      32a:       77 1e                   ja     34a <dc_fixpt_recip+0xba>
>      32c:       48 39 ca                cmp    %rcx,%rdx
>      32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
>      333:       48 89 c1                mov    %rax,%rcx
>      336:       48 f7 d9                neg    %rcx
>      339:       48 85 ff                test   %rdi,%rdi
>      33c:       48 0f 49 c8             cmovns %rax,%rcx
>      340:       48 89 c8                mov    %rcx,%rax
>      343:       5b                      pop    %rbx
>      344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
>      34a:       0f 0b                   ud2
>      34c:       eb de                   jmp    32c <dc_fixpt_recip+0x9c>
>      34e:       0f 0b                   ud2
>      350:       0f 0b                   ud2
>      352:       66 66 66 66 66 2e 0f    data16 data16 data16 data16 cs nopw
> 0x0(%rax,%rax,1)
>      359:       1f 84 00 00 00 00 00

If you put them size-by-side, you'll see it's more or less the same
code-gen (trivial differences), but now it just stops code-gen, where
previously it would continue.

So this really is a compiler problem, this needs no annotation, it's
straight up broken.

Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
suspect clang figured that out and invokes UB on us and just stops
code-gen.

Nathan, Nick, don't we have a compiler flag that forces __builtin_trap()
whenever clang pulls something like this? I think UBSAN does this, but
we really shouldn't pull in the whole of that for sanity.
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Nathan Chancellor 12 months ago
On Fri, Dec 20, 2024 at 11:31:00AM +0100, Peter Zijlstra wrote:
> Also, curse the DRM Makefiles, you can't do:
> 
>   make drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.s

Small tip: You can get the path of the target by building
drivers/gpu/drm/amd/amdgpu/ and finding it in the output. In this case,
it'd be

  $ make drivers/gpu/drm/amd/amdgpu/../display/dc/basics/fixpt31_32.s

Not excusing that it does not work as it should but sometimes you have
to work with what you can *shrug*

> > $ clang --version | head -1
> > clang version 20.0.0git (https://github.com/llvm/llvm-project.git
> > 8daf4f16fa08b5d876e98108721dd1743a360326)
> 
> So I didn't have a recent build at hand.. so I've not validated the
> below.
...
> If you put them size-by-side, you'll see it's more or less the same
> code-gen (trivial differences), but now it just stops code-gen, where
> previously it would continue.
> 
> So this really is a compiler problem, this needs no annotation, it's
> straight up broken.
> 
> Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
> suspect clang figured that out and invokes UB on us and just stops
> code-gen.

Yeah, I think your analysis is spot on, as this was introduced by a
change in clang from a few months ago according to my bisect:

https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448

Since the ASSERT does not do anything to prevent the divide by zero (it
just flags it with WARN_ON) and the rest of the code doesn't either, I
assume that the codegen stops as soon as it encounters the unreachable
that change created from the path where divide by zero would occur via

  dc_fixpt_recip() ->
    dc_fixpt_from_fraction() ->
      complete_integer_division_u64() ->
        div64_u64_rem()

Shouldn't callers of division functions harden them against dividing by
zero?

> Nathan, Nick, don't we have a compiler flag that forces __builtin_trap()
> whenever clang pulls something like this? I think UBSAN does this, but
> we really shouldn't pull in the whole of that for sanity.

Right, I think that LLVM has a hidden flag for this:

  -mllvm -trap-unreachable

That makes this particular warning disappear.

It isn't the greatest because '-mllvm' flags need to be passed along to
the linker for LTO but that's easy enough to deal with. I know we have
talked about enabling that flag in the past but I cannot remember why we
decided against it (maybe code size concerns and other optimization
restrictions)? It looks like GCC has a similar flag,
-funreachable-traps.

Cheers,
Nathan
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Xi Ruoyao 12 months ago
On Fri, 2024-12-20 at 15:34 -0700, Nathan Chancellor wrote:
> > Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
> > suspect clang figured that out and invokes UB on us and just stops
> > code-gen.
> 
> Yeah, I think your analysis is spot on, as this was introduced by a
> change in clang from a few months ago according to my bisect:
> 
> https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448
> 
> Since the ASSERT does not do anything to prevent the divide by zero (it
> just flags it with WARN_ON) and the rest of the code doesn't either, I
> assume that the codegen stops as soon as it encounters the unreachable
> that change created from the path where divide by zero would occur via
> 
>   dc_fixpt_recip() ->
>     dc_fixpt_from_fraction() ->
>       complete_integer_division_u64() ->
>         div64_u64_rem()
> 
> Shouldn't callers of division functions harden them against dividing by
> zero?

Yes I think it'd be the correct solution.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Tiezhu Yang 12 months ago
On 12/21/2024 03:40 PM, Xi Ruoyao wrote:
> On Fri, 2024-12-20 at 15:34 -0700, Nathan Chancellor wrote:
>>> Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
>>> suspect clang figured that out and invokes UB on us and just stops
>>> code-gen.
>>
>> Yeah, I think your analysis is spot on, as this was introduced by a
>> change in clang from a few months ago according to my bisect:
>>
>> https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448
>>
>> Since the ASSERT does not do anything to prevent the divide by zero (it
>> just flags it with WARN_ON) and the rest of the code doesn't either, I
>> assume that the codegen stops as soon as it encounters the unreachable
>> that change created from the path where divide by zero would occur via
>>
>>   dc_fixpt_recip() ->
>>     dc_fixpt_from_fraction() ->
>>       complete_integer_division_u64() ->
>>         div64_u64_rem()
>>
>> Shouldn't callers of division functions harden them against dividing by
>> zero?
>
> Yes I think it'd be the correct solution.

Thank you all. Do you mean like this?

--- >8 ---

diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c 
b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
index 88d3f9d7dd55..848d8e67304a 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
@@ -79,11 +79,13 @@ struct fixed31_32 dc_fixpt_from_fraction(long long 
numerator, long long denomina
         unsigned long long arg2_value = arg2_negative ? -denominator : 
denominator;

         unsigned long long remainder;
+       unsigned long long res_value;

         /* determine integer part */

-       unsigned long long res_value = complete_integer_division_u64(
-               arg1_value, arg2_value, &remainder);
+       ASSERT(arg2_value);
+
+       res_value = complete_integer_division_u64(arg1_value, 
arg2_value, &remainder);

         ASSERT(res_value <= LONG_MAX);

@@ -214,8 +216,6 @@ struct fixed31_32 dc_fixpt_recip(struct fixed31_32 arg)
          * Good idea to use Newton's method
          */

-       ASSERT(arg.value);
-
         return dc_fixpt_from_fraction(
                 dc_fixpt_one.value,
                 arg.value);

With the above changes, there is no "falls through" objtool warning
compiled with both clang 19 and the latest mainline clang 20.

If you are OK with it, I will send a separate formal patch to handle
this issue after doing some more testing.

Thanks,
Tiezhu
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Nathan Chancellor 11 months, 4 weeks ago
On Sun, Dec 22, 2024 at 12:27:47PM +0800, Tiezhu Yang wrote:
> On 12/21/2024 03:40 PM, Xi Ruoyao wrote:
> > On Fri, 2024-12-20 at 15:34 -0700, Nathan Chancellor wrote:
> > > > Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
> > > > suspect clang figured that out and invokes UB on us and just stops
> > > > code-gen.
> > > 
> > > Yeah, I think your analysis is spot on, as this was introduced by a
> > > change in clang from a few months ago according to my bisect:
> > > 
> > > https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448
> > > 
> > > Since the ASSERT does not do anything to prevent the divide by zero (it
> > > just flags it with WARN_ON) and the rest of the code doesn't either, I
> > > assume that the codegen stops as soon as it encounters the unreachable
> > > that change created from the path where divide by zero would occur via
> > > 
> > >   dc_fixpt_recip() ->
> > >     dc_fixpt_from_fraction() ->
> > >       complete_integer_division_u64() ->
> > >         div64_u64_rem()
> > > 
> > > Shouldn't callers of division functions harden them against dividing by
> > > zero?
> > 
> > Yes I think it'd be the correct solution.
> 
> Thank you all. Do you mean like this?
> 
> --- >8 ---
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> index 88d3f9d7dd55..848d8e67304a 100644
> --- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> +++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
> @@ -79,11 +79,13 @@ struct fixed31_32 dc_fixpt_from_fraction(long long
> numerator, long long denomina
>         unsigned long long arg2_value = arg2_negative ? -denominator :
> denominator;
> 
>         unsigned long long remainder;
> +       unsigned long long res_value;
> 
>         /* determine integer part */
> 
> -       unsigned long long res_value = complete_integer_division_u64(
> -               arg1_value, arg2_value, &remainder);
> +       ASSERT(arg2_value);
> +
> +       res_value = complete_integer_division_u64(arg1_value, arg2_value,
> &remainder);
> 
>         ASSERT(res_value <= LONG_MAX);
> 
> @@ -214,8 +216,6 @@ struct fixed31_32 dc_fixpt_recip(struct fixed31_32 arg)
>          * Good idea to use Newton's method
>          */
> 
> -       ASSERT(arg.value);
> -
>         return dc_fixpt_from_fraction(
>                 dc_fixpt_one.value,
>                 arg.value);
> 
> With the above changes, there is no "falls through" objtool warning
> compiled with both clang 19 and the latest mainline clang 20.

I am somewhat surprised that changes anything because the ASSERT is not
stopping control flow so I would expect the same problem as before. I
guess it does not happen perhaps due to inlining differences? I looked
at this code briefly when I sent my initial message and I was not sure
where such a check should exist. It does not look like these functions
really do any sort of error handling.

> If you are OK with it, I will send a separate formal patch to handle
> this issue after doing some more testing.

It may still be worth doing this to get some initial thoughts from the
AMD DRM folks.

Cheers,
Nathan
Re: [PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Posted by Tiezhu Yang 11 months, 3 weeks ago
On 12/24/2024 05:46 AM, Nathan Chancellor wrote:
> On Sun, Dec 22, 2024 at 12:27:47PM +0800, Tiezhu Yang wrote:

...

>> With the above changes, there is no "falls through" objtool warning
>> compiled with both clang 19 and the latest mainline clang 20.
>
> I am somewhat surprised that changes anything because the ASSERT is not
> stopping control flow so I would expect the same problem as before. I
> guess it does not happen perhaps due to inlining differences? I looked

It is weird and I think it is not the correct way.

> at this code briefly when I sent my initial message and I was not sure
> where such a check should exist. It does not look like these functions
> really do any sort of error handling.
>
>> If you are OK with it, I will send a separate formal patch to handle
>> this issue after doing some more testing.
>
> It may still be worth doing this to get some initial thoughts from the
> AMD DRM folks.

I think the correct way is:

Keep the current ASSERT for the aim of debugging, just add BUG() to
stop control flow if the divisor is zero.

--- >8 ---

diff --git a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c 
b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
index 88d3f9d7dd55..e15391e36b40 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c
@@ -52,6 +52,7 @@ static inline unsigned long long 
complete_integer_division_u64(
         unsigned long long result;

         ASSERT(divisor);
+       BUG_ON(!divisor);

         result = div64_u64_rem(dividend, divisor, remainder);

diff --git a/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c 
b/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
index 131f1e3949d3..ce2036950808 100644
--- a/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
+++ b/drivers/gpu/drm/amd/display/dc/spl/spl_fixpt31_32.c
@@ -30,6 +30,7 @@ static inline unsigned long long 
spl_complete_integer_division_u64(
         unsigned long long result;

         SPL_ASSERT(divisor);
+       BUG_ON(!divisor);

         result = spl_div64_u64_rem(dividend, divisor, remainder);

It looks reasonable and works well both on x86 and LoongArch, there are
no the following objtool warnings:

   dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
   spl_fixpt_recip() falls through to next function spl_fixpt_sinc()

If no more comments, I will send a separate formal patch for your
review in the next week.

Thanks,
Tiezhu