[PATCH v1] LoongArch: Add -fno-isolate-erroneous-paths-dereference in Makefile

Tiezhu Yang posted 1 patch 1 week, 1 day ago
arch/loongarch/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v1] LoongArch: Add -fno-isolate-erroneous-paths-dereference in Makefile
Posted by Tiezhu Yang 1 week, 1 day ago
Currently, when compiling with GCC, there is no "break 0x7" instruction
for zero division due to using the option -mno-check-zero-division, but
the compiler still generates "break 0x0" instruction for zero division.

Here is a simple example:

  $ cat test.c
  int div(int a)
  {
	  return a / 0;
  }
  $ gcc -O2 -S test.c -o test.s

GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
"ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
objtool warnings for LoongArch, but this is not the intention.

When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
to handle "break 0" as a trap. The generated "break 0" for zero division
by GCC is not proper, it should generate a break instruction with proper
bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
to avoid generating the unexpected "break 0" instruction for now.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202509200413.7uihAxJ5-lkp@intel.com/
Fixes: baad7830ee9a ("objtool/LoongArch: Mark types based on break immediate code")
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index ae419e32f22e..f2a585b4a937 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -129,7 +129,7 @@ KBUILD_RUSTFLAGS_KERNEL		+= -Crelocation-model=pie
 LDFLAGS_vmlinux			+= -static -pie --no-dynamic-linker -z notext $(call ld-option, --apply-dynamic-relocs)
 endif
 
-cflags-y += $(call cc-option, -mno-check-zero-division)
+cflags-y += $(call cc-option, -mno-check-zero-division -fno-isolate-erroneous-paths-dereference)
 
 ifndef CONFIG_KASAN
 cflags-y += -fno-builtin-memcpy -fno-builtin-memmove -fno-builtin-memset
-- 
2.42.0
Re: [PATCH v1] LoongArch: Add -fno-isolate-erroneous-paths-dereference in Makefile
Posted by Huacai Chen 1 week, 1 day ago
Hi, Tiezhu,

On Tue, Sep 23, 2025 at 2:17 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> Currently, when compiling with GCC, there is no "break 0x7" instruction
> for zero division due to using the option -mno-check-zero-division, but
> the compiler still generates "break 0x0" instruction for zero division.
>
> Here is a simple example:
>
>   $ cat test.c
>   int div(int a)
>   {
>           return a / 0;
>   }
>   $ gcc -O2 -S test.c -o test.s
>
> GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
> "ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
> objtool warnings for LoongArch, but this is not the intention.
>
> When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
> to handle "break 0" as a trap. The generated "break 0" for zero division
> by GCC is not proper, it should generate a break instruction with proper
> bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
> to avoid generating the unexpected "break 0" instruction for now.
You said that this patch make performance increase a little. But this
is strange, because -isolate-erroneous-paths-dereference rather than
-no-isolate-erroneous-paths-dereference is considered as an
optimization.

Huacai

>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/r/202509200413.7uihAxJ5-lkp@intel.com/
> Fixes: baad7830ee9a ("objtool/LoongArch: Mark types based on break immediate code")
> Suggested-by: WANG Rui <wangrui@loongson.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  arch/loongarch/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
> index ae419e32f22e..f2a585b4a937 100644
> --- a/arch/loongarch/Makefile
> +++ b/arch/loongarch/Makefile
> @@ -129,7 +129,7 @@ KBUILD_RUSTFLAGS_KERNEL             += -Crelocation-model=pie
>  LDFLAGS_vmlinux                        += -static -pie --no-dynamic-linker -z notext $(call ld-option, --apply-dynamic-relocs)
>  endif
>
> -cflags-y += $(call cc-option, -mno-check-zero-division)
> +cflags-y += $(call cc-option, -mno-check-zero-division -fno-isolate-erroneous-paths-dereference)
>
>  ifndef CONFIG_KASAN
>  cflags-y += -fno-builtin-memcpy -fno-builtin-memmove -fno-builtin-memset
> --
> 2.42.0
>
Re: [PATCH v1] LoongArch: Add -fno-isolate-erroneous-paths-dereference in Makefile
Posted by Tiezhu Yang 3 days, 17 hours ago
On 2025/9/23 下午10:32, Huacai Chen wrote:
> Hi, Tiezhu,
> 
> On Tue, Sep 23, 2025 at 2:17 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>>
>> Currently, when compiling with GCC, there is no "break 0x7" instruction
>> for zero division due to using the option -mno-check-zero-division, but
>> the compiler still generates "break 0x0" instruction for zero division.
>>
>> Here is a simple example:
>>
>>    $ cat test.c
>>    int div(int a)
>>    {
>>            return a / 0;
>>    }
>>    $ gcc -O2 -S test.c -o test.s
>>
>> GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
>> "ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
>> objtool warnings for LoongArch, but this is not the intention.
>>
>> When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
>> to handle "break 0" as a trap. The generated "break 0" for zero division
>> by GCC is not proper, it should generate a break instruction with proper
>> bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
>> to avoid generating the unexpected "break 0" instruction for now.
> You said that this patch make performance increase a little. But this
> is strange, because -isolate-erroneous-paths-dereference rather than
> -no-isolate-erroneous-paths-dereference is considered as an
> optimization.

I tested linux 6.17-rc7 with loongson3_defconfig, only a
little improvement (about 0.3%) with "./Run -c 1".

Here are the test steps, anyone who is interested can test
again to get the actual results on the specified environment:

   git clone https://github.com/kdlucas/byte-unixbench.git
   cd byte-unixbench/UnixBench/
   make
   ./Run -c 1
   ./Run -c 8

Here are the objdump info for sched_update_scaling() in
kernel/sched/fair.o:

Before:

000000000000bbc8 <sched_update_scaling>:
     bbc8:       1a00000c        pcalau12i       $t0, 0
     bbcc:       1a00000d        pcalau12i       $t1, 0
     bbd0:       02c0018c        addi.d          $t0, $t0, 0
     bbd4:       288001ae        ld.w            $t2, $t1, 0
     bbd8:       24000190        ldptr.w         $t4, $t0, 0
     bbdc:       004081ce        slli.w          $t2, $t2, 0x0
     bbe0:       40006200        beqz            $t4, 96 # bc40 
<sched_update_scaling+0x78>
     bbe4:       0280200d        addi.w          $t1, $zero, 8
     bbe8:       0012b9ad        sltu            $t1, $t1, $t2
     bbec:       02802012        addi.w          $t6, $zero, 8
     bbf0:       0013b5cf        masknez         $t3, $t2, $t1
     bbf4:       0013364d        maskeqz         $t1, $t6, $t1
     bbf8:       001535ed        or              $t1, $t3, $t1
     bbfc:       02800811        addi.w          $t5, $zero, 2
     bc00:       004081af        slli.w          $t3, $t1, 0x0
     bc04:       58001611        beq             $t4, $t5, 20    # bc18 
<sched_update_scaling+0x50>
     bc08:       400051c0        beqz            $t2, 80 # bc58 
<sched_update_scaling+0x90>
     bc0c:       000015ad        clz.w           $t1, $t1
     bc10:       0280800f        addi.w          $t3, $zero, 32
     bc14:       001135ef        sub.w           $t3, $t3, $t1
     bc18:       24000d8d        ldptr.w         $t1, $t0, 12
     bc1c:       00150004        or              $a0, $zero, $zero
     bc20:       00213dad        div.wu          $t1, $t1, $t3
     bc24:       2980418d        st.w            $t1, $t0, 16
     bc28:       4c000020        jirl            $zero, $ra, 0
     bc2c:       03400000        andi            $zero, $zero, 0x0
     bc30:       03400000        andi            $zero, $zero, 0x0
     bc34:       03400000        andi            $zero, $zero, 0x0
     bc38:       03400000        andi            $zero, $zero, 0x0
     bc3c:       03400000        andi            $zero, $zero, 0x0
     bc40:       24000d8d        ldptr.w         $t1, $t0, 12
     bc44:       0280040f        addi.w          $t3, $zero, 1
     bc48:       00150004        or              $a0, $zero, $zero
     bc4c:       00213dad        div.wu          $t1, $t1, $t3
     bc50:       2980418d        st.w            $t1, $t0, 16
     bc54:       4c000020        jirl            $zero, $ra, 0
     bc58:       002a0000        break           0x0
     bc5c:       03400000        andi            $zero, $zero, 0x0

After:

000000000000bbc8 <sched_update_scaling>:
     bbc8:       1a00000c        pcalau12i       $t0, 0
     bbcc:       1a00000d        pcalau12i       $t1, 0
     bbd0:       02c0018c        addi.d          $t0, $t0, 0
     bbd4:       288001ae        ld.w            $t2, $t1, 0
     bbd8:       24000190        ldptr.w         $t4, $t0, 0
     bbdc:       0280040f        addi.w          $t3, $zero, 1
     bbe0:       004081ce        slli.w          $t2, $t2, 0x0
     bbe4:       40003a00        beqz            $t4, 56 # bc1c 
<sched_update_scaling+0x54>
     bbe8:       0280200d        addi.w          $t1, $zero, 8
     bbec:       0012b9ad        sltu            $t1, $t1, $t2
     bbf0:       02802012        addi.w          $t6, $zero, 8
     bbf4:       0013b5cf        masknez         $t3, $t2, $t1
     bbf8:       0013364d        maskeqz         $t1, $t6, $t1
     bbfc:       001535ed        or              $t1, $t3, $t1
     bc00:       02800811        addi.w          $t5, $zero, 2
     bc04:       004081af        slli.w          $t3, $t1, 0x0
     bc08:       58001611        beq             $t4, $t5, 20    # bc1c 
<sched_update_scaling+0x54>
     bc0c:       000015ad        clz.w           $t1, $t1
     bc10:       0280800f        addi.w          $t3, $zero, 32
     bc14:       001135ef        sub.w           $t3, $t3, $t1
     bc18:       001339ef        maskeqz         $t3, $t3, $t2
     bc1c:       24000d8d        ldptr.w         $t1, $t0, 12
     bc20:       00150004        or              $a0, $zero, $zero
     bc24:       00213dad        div.wu          $t1, $t1, $t3
     bc28:       2980418d        st.w            $t1, $t0, 16
     bc2c:       4c000020        jirl            $zero, $ra, 0

There is no beqz instruction for zero division with this patch,
I guess it will affect the performance to some extent. IMO, the
isolate-erroneous-paths-dereference optimization is for error
code path, not for performance.

Anyway, my initial aim is to check whether exist performance
regression, from the point of view of the test results, there
is no obvious differences with this patch.

Thanks,
Tiezhu

Re: [PATCH v1] LoongArch: Add -fno-isolate-erroneous-paths-dereference in Makefile
Posted by Xi Ruoyao 1 week, 1 day ago
On Tue, 2025-09-23 at 14:17 +0800, Tiezhu Yang wrote:
> Currently, when compiling with GCC, there is no "break 0x7" instruction
> for zero division due to using the option -mno-check-zero-division, but
> the compiler still generates "break 0x0" instruction for zero division.
> 
> Here is a simple example:
> 
>   $ cat test.c
>   int div(int a)
>   {
> 	  return a / 0;
>   }
>   $ gcc -O2 -S test.c -o test.s
> 
> GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
> "ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
> objtool warnings for LoongArch, but this is not the intention.
> 
> When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
> to handle "break 0" as a trap. The generated "break 0" for zero division
> by GCC is not proper, it should generate a break instruction with proper
> bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
> to avoid generating the unexpected "break 0" instruction for now.

I just proposed GCC to use the same "documented undefined instruction"
as Clang:
https://gcc.gnu.org/pipermail/gcc-patches/2025-September/695981.html.

-- 
Xi Ruoyao <xry111@xry111.site>
Re: [PATCH v1] LoongArch: Add -fno-isolate-erroneous-paths-dereference in Makefile
Posted by Huacai Chen 3 days, 13 hours ago
On Tue, Sep 23, 2025 at 8:39 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Tue, 2025-09-23 at 14:17 +0800, Tiezhu Yang wrote:
> > Currently, when compiling with GCC, there is no "break 0x7" instruction
> > for zero division due to using the option -mno-check-zero-division, but
> > the compiler still generates "break 0x0" instruction for zero division.
> >
> > Here is a simple example:
> >
> >   $ cat test.c
> >   int div(int a)
> >   {
> >         return a / 0;
> >   }
> >   $ gcc -O2 -S test.c -o test.s
> >
> > GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
> > "ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
> > objtool warnings for LoongArch, but this is not the intention.
> >
> > When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
> > to handle "break 0" as a trap. The generated "break 0" for zero division
> > by GCC is not proper, it should generate a break instruction with proper
> > bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
> > to avoid generating the unexpected "break 0" instruction for now.
>
> I just proposed GCC to use the same "documented undefined instruction"
> as Clang:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-September/695981.html.
I have discussed it with Tiezhu offline, and we prefer "break 0x1".

Huacai

>
> --
> Xi Ruoyao <xry111@xry111.site>