From nobody Sun Feb 8 02:08:26 2026 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB2AB331A7D for ; Thu, 18 Dec 2025 09:54:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766051679; cv=none; b=aReV3S0V6N1LCfT0HUorUAPAzZ85b+EI6cuY7OMhHOHco4TVrRjgNjLOTOsGI1K9hDOdUawvpJ+uTiOgvLDRikrprSHQIJai2kvUtQAakAMmYhy+V7+sWhZRvFlKNsTtgUVOhzxE6u8AT1apzS42RjHIZE0AuRrrRjkUVJvJSP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766051679; c=relaxed/simple; bh=jzclNBVdmcN1NvfTa7Pn5TOKEeMCI6sEagCT+8AFqQg=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=RjECM1gLMWV8rOdIyjLzTIoWJAj3TR6jpIJE7Jo2EGmXQ+dR/Y+uMY1BSJjD2FMz2n7t2SptSlu0BypqYgVunoOxTpC1TetzEU0caklQROZESkvTSWEuOuSrcpCR7tUl5SD1LCSU4XwKxGh1SHIDvo0llSVzx/7vFc15HM3vFQE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Yer9zy4A; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Yer9zy4A" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-78a712cfae6so4760837b3.3 for ; Thu, 18 Dec 2025 01:54:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1766051677; x=1766656477; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=9vCK+UTPSBOfQ4F5wzQmBqwDW3qYLGlgBfkiw+7ObgU=; b=Yer9zy4AYKMRmiVXNx5wO1Va5zdozjtXL/95xYsLfYJqfPxCgMEjA6eHmGzYHFAl3M 8Ic4OgpAwb656/StSc8/B5nVNYbIDtkLLlHnlGYT3TAcHIuPushS/DLqIHcaITcS+ibz YBVSPP7iKC8KdeEVtf32hO+cHVyQIBHm/xzd4TiX7cyIbsISMP0TXtiM6ksiM9AG1Jup 2HnQqrQFdW0dfIeQ+aq+/QHaBVeKsdAwI6pnoj2WsH1PWJvI6v+5KKGnxta6tfqd75oO uhJiGKcCO1TdOnXC9amirNcIVq/3gnltjdJKLQquiv3f/yFvdu8mXy8DTwWoD59ycsFg 51IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766051677; x=1766656477; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9vCK+UTPSBOfQ4F5wzQmBqwDW3qYLGlgBfkiw+7ObgU=; b=rC2V5cQCFs3Z+J8ZtP6igaWmAbDdx8aKIl89isiPfGD2c8gqlSTRKEfPEVebNkKGFU qcZFvbKi30y8woZ99cuHh9kiG7RX0AX3RPSxQKl5W7q9gZqnq6b0D43DZqdoSMXmF0Xw Rtyg8lPk09MdpNlEmDq5rC60H8dk2FfYHIRFkPY+StnCvNKFAHmG/wqPYNunpIYWwXHx 1hhq0Hbr1LohSvRoOO5ghai0A3vlFPTPlrTL9rQNVSa9R5u0kiZRags59SybP5UjEIAi 3OY5/4tWuXN5DYjk2HyFStHPWsMQFA8KCCHMMd9ojdHXR3qCECQtstD1vqTmx3MdZiIZ +qKA== X-Gm-Message-State: AOJu0YwU59ReZN7NCYhpNtAlhwOjRfje5G1+8HfaOaNrTPhIoYLZqT7O ToAUhAnKXRQE+5hriBRzAEIZIu2PHjA/kjK5+i7i+3YKZ0k5g4zwxQvwsq65eoo2ATusyTiGkmJ 2VorMc9gev66gfQ== X-Google-Smtp-Source: AGHT+IExM1Tmnjm6kmIagQMxj7KokB7t4quLtPLSSa2/ff1QPDWyKxlXpE4vlFfkNhovvRUYK4QYy7VQs4Epsw== X-Received: from ywdt6.prod.google.com ([2002:a05:690c:e586:b0:78e:11fa:97f2]) (user=edumazet job=prod-delivery.src-stubby-dispatcher) by 2002:a05:690c:6f07:b0:78c:5803:f68e with SMTP id 00721157ae682-78e66e58731mr162517727b3.33.1766051676860; Thu, 18 Dec 2025 01:54:36 -0800 (PST) Date: Thu, 18 Dec 2025 09:54:33 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.52.0.313.g674ac2bdf7-goog Message-ID: <20251218095434.1052422-1-edumazet@google.com> Subject: [PATCH] x86/irqflags: Force a register output in native_save_fl() From: Eric Dumazet To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" Cc: linux-kernel , Eric Dumazet , Eric Dumazet Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" clang is generating very inefficient code for native_save_fl() which is used for local_irq_save() in few critical spots. Allowing the "pop %0" to use memory : 1) forces the compiler to add annoying stack canaries when CONFIG_STACKPROTECTOR_STRONG=3Dy in many places. 2) Almost always is followed by an immediate "move memory,register" One good example is _raw_spin_lock_irqsave, with 8 extra instructions ffffffff82067a30 <_raw_spin_lock_irqsave>: ffffffff82067a30: ... ffffffff82067a39: 53 push %rbx // Three instructions to ajust the stack, read the per-cpu canary // and copy it to 8(%rsp) ffffffff82067a3a: 48 83 ec 10 sub $0x10,%rsp ffffffff82067a3e: 65 48 8b 05 da 15 45 02 mov %gs:0x24515da(%rip),= %rax # <__stack_chk_guard> ffffffff82067a46: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff82067a4b: 9c pushf // instead of pop %rbx, compiler uses 2 instructions. ffffffff82067a4c: 8f 04 24 pop (%rsp) ffffffff82067a4f: 48 8b 1c 24 mov (%rsp),%rbx ffffffff82067a53: fa cli ffffffff82067a54: b9 01 00 00 00 mov $0x1,%ecx ffffffff82067a59: 31 c0 xor %eax,%eax ffffffff82067a5b: f0 0f b1 0f lock cmpxchg %ecx,(%rdi) ffffffff82067a5f: 75 1d jne ffffffff82067a7e <_r= aw_spin_lock_irqsave+0x4e> // three instructions to check the stack canary ffffffff82067a61: 65 48 8b 05 b7 15 45 02 mov %gs:0x24515b7(%rip),= %rax # <__stack_chk_guard> ffffffff82067a69: 48 3b 44 24 08 cmp 0x8(%rsp),%rax ffffffff82067a6e: 75 17 jne ffffffff82067a87 ... // One extra instruction to adjust the stack. ffffffff82067a73: 48 83 c4 10 add $0x10,%rsp ... // One more instruction in case the stack was mangled. ffffffff82067a87: e8 a4 35 ff ff call ffffffff8205b030 <__= stack_chk_fail> This patch changes almost nothing for gcc, but saves 23153 bytes of text with clang, while allowing more functions to be inlined. $ size vmlinux.gcc.before vmlinux.gcc.after vmlinux.clang.before vmlinux.cl= ang.after text data bss dec hex filename 45564911 25005415 4708896 75279221 47cab75 vmlinux.gcc.before 45564901 25005414 4708896 75279211 47cab6b vmlinux.gcc.after 45115647 24638569 5537072 75291288 47cda98 vmlinux.clang.before 45092494 24638585 5545000 75276079 47c9f2f vmlinux.clang.after $ scripts/bloat-o-meter -t vmlinux.clang.before vmlinux.clang.after add/remove: 0/3 grow/shrink: 22/533 up/down: 5162/-25088 (-19926) Function old new delta __noinstr_text_start 640 3584 +2944 wakeup_cpu_via_vmgexit 1002 1447 +445 rcu_tasks_trace_pregp_step 1052 1454 +402 snp_kexec_finish 1290 1527 +237 check_all_holdout_tasks_trace 909 1106 +197 x2apic_send_IPI_mask_allbutself 38 198 +160 hpet_set_rtc_irq_bit 118 265 +147 x2apic_send_IPI_mask 38 184 +146 ring_buffer_poll_wait 261 405 +144 rb_watermark_hit 253 386 +133 ... tcp_wfree 402 332 -70 stacktrace_trigger 133 62 -71 w1_touch_bit 418 343 -75 w1_triplet 446 370 -76 link_create 980 902 -78 drain_dead_softirq_workfn 425 347 -78 kcryptd_queue_crypt 253 174 -79 perf_event_aux_pause 448 368 -80 idle_worker_timeout 320 240 -80 srcu_funnel_exp_start 418 333 -85 call_rcu 751 666 -85 enable_IR_x2apic 279 191 -88 bpf_link_free 432 342 -90 synchronize_rcu 497 403 -94 identify_cpu 2665 2569 -96 ftrace_modify_all_code 355 258 -97 load_gs_index 212 104 -108 verity_end_io 369 257 -112 bpf_prog_detach 672 555 -117 __x2apic_send_IPI_mask 552 275 -277 snp_cleanup_vmsa 284 - -284 __end_rodata 606208 602112 -4096 Total: Before=3D28577927, After=3D28558001, chg -0.07% Signed-off-by: Eric Dumazet --- arch/x86/include/asm/irqflags.h | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflag= s.h index b30e5474c18e1be63b7c69354c26ae6a6cb02731..a06bdc51b6e3d04c06352c28389= b10ec66551a0e 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -18,14 +18,9 @@ extern __always_inline unsigned long native_save_fl(void) { unsigned long flags; =20 - /* - * "=3Drm" is safe here, because "pop" adjusts the stack before - * it evaluates its effective address -- this is part of the - * documented behavior of the "pop" instruction. - */ asm volatile("# __raw_save_flags\n\t" "pushf ; pop %0" - : "=3Drm" (flags) + : "=3Dr" (flags) : /* no input */ : "memory"); =20 --=20 2.52.0.313.g674ac2bdf7-goog