x86/AMD: Fix ASM constraints in amd_clear_divider()

[PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()

Posted by Andrew Cooper 2 years, 6 months ago

DIV writes its results into %eax and %edx, meaning that they need to be output
constraints too.  It happens to be benign in this case as the registers don't
change value, but the compiler should still know.

Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Borislav Petkov <bp@alien8.de>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/cpu/amd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b55d8f82b621..8585a4be1912 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1325,6 +1325,8 @@ bool cpu_has_ibpb_brtype_microcode(void)
  */
 void noinstr amd_clear_divider(void)
 {
+	unsigned int a = 0, d = 0;
+
 	asm volatile(ALTERNATIVE("", "div %2\n\t", X86_BUG_DIV0)
-		     :: "a" (0), "d" (0), "r" (1));
+		     : "+a" (a), "+d" (d) : "r" (1));
 }

base-commit: cacc6e22932f373a91d7be55a9b992dc77f4c59b
-- 
2.30.2

Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()

Posted by Linus Torvalds 2 years, 6 months ago

On Wed, 9 Aug 2023 at 13:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> DIV writes its results into %eax and %edx, meaning that they need to be output
> constraints too.  It happens to be benign in this case as the registers don't
> change value, but the compiler should still know.
>
> Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")

As mentioned earlier (html, not on list), I think it was intentional
and this "fix" doesn't really fix anything.

A comment might be good, of course, if this really bothers somebody.

That said, if the code wanted to be *really* clever, it could have done

        asm volatile(ALTERNATIVE("", "div %0", X86_BUG_DIV0)
                     :: "a" (1), "d" (0));

instead. One less register used, and the same "no change to register
state" approach.

Of course, who knows what early-out algorithm the divider uses, and
maybe it's cheaper to do 0/1 than it is to do 1/1. Not that I think we
should care. The main reason to be cute here would be just to be cute.

That said, if you were to use this pattern in *real* code (as opposed
to in that function that will never be called in reality because
nobody divides by zero in real code), the register clobbers might be
useful just to make sure the compiler doesn't re-use a zero register
content that is then behind the latency of the dummy divide. But
again, this particular code really doesn't _matter_ in that sense.

               Linus

Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()

Posted by Andrew Cooper 2 years, 6 months ago

On 09/08/2023 10:33 pm, Linus Torvalds wrote:
> On Wed, 9 Aug 2023 at 13:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> DIV writes its results into %eax and %edx, meaning that they need to be output
>> constraints too.  It happens to be benign in this case as the registers don't
>> change value, but the compiler should still know.
>>
>> Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
> As mentioned earlier (html, not on list), I think it was intentional
> and this "fix" doesn't really fix anything.
>
> A comment might be good, of course, if this really bothers somebody.
>
> That said, if the code wanted to be *really* clever, it could have done
>
>         asm volatile(ALTERNATIVE("", "div %0", X86_BUG_DIV0)
>                      :: "a" (1), "d" (0));
>
> instead. One less register used, and the same "no change to register
> state" approach.

Yeah, I spotted that as an option, and it does save one whole zeroing
idiom...

But IMO, the risk of someone copy&pasting this as if it were a good
example, and the debugging thereafter ought to be enough of a reason to
avoid klever tricks to save 1 line of C.

> Of course, who knows what early-out algorithm the divider uses, and
> maybe it's cheaper to do 0/1 than it is to do 1/1. Not that I think we
> should care. The main reason to be cute here would be just to be cute.

AMD said "any non-faulting divide".  Which still isn't as helpful as it
could be, because according to Agner Fogh:

             Uops Latency
DIV  r8/m8    1    13-16
DIV  r16/m16  2    14-21
DIV  r32/m32  2    14-30
DIV  r64/m64  2    14-46
IDIV r8/m8    1    13-16
IDIV r16/m16  2    13-21
IDIV r32/m32  2    14-30
IDIV r64/m64  2    14-47

DIV %al looks to be the firm favourite choice.

Assuming the one extra cycle is just for the double-pumped uop, then the
best latency for a divide is 13 cycles across the board.

It doesn't make sense to optimise this as a fastpath.  After all, what
fool would put a real divide-by-1 in their code...

> That said, if you were to use this pattern in *real* code (as opposed
> to in that function that will never be called in reality because
> nobody divides by zero in real code), the register clobbers might be
> useful just to make sure the compiler doesn't re-use a zero register
> content that is then behind the latency of the dummy divide. But
> again, this particular code really doesn't _matter_ in that sense.

Well - that's a different question.

An attacker skilled in the art can easily hide #DE in the transient
shadow of something else, and plenty of people got very skilled in this
particular art trying to make better Meltdown exploits.

So I don't think putting any scrubbing in the #DE handler is going to
stop a real attack.  But I'm just speculating.

~Andrew

P.S. https://www.amd.com/system/files/documents/security-whitepaper.pdf
currently says

"The divide by zero (#DE) fault is signaled[sic] on the integer divide
instructions. No data is forwarded to younger, dependent operations for
speculative execution on this fault."

which needs to be revisited.  Zen1 was the latest-and-greatest when that
whitepaper was written.

Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()

Posted by Linus Torvalds 2 years, 6 months ago

On Wed, 9 Aug 2023 at 16:12, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> But IMO, the risk of someone copy&pasting this as if it were a good
> example, and the debugging thereafter ought to be enough of a reason to
> avoid klever tricks to save 1 line of C.

That's not the point. The point is that this is very special code, and
there's no way you can copy-and-paste it for anything else.

In fact, the very lack of outputs in the asm is part of what makes
such a copy-and-paste impossible.

You copy-and-paste that thing, and you simply don't get any useful
results, because the asm doesn't have any outputs.

There's literally no other possible use of that asm than as a "this
doesn't do anything but write to whatever stale divide buffers".

In other words, you might as well have fun with it. Because it is in
no way useful in any other way.

And I think that's a feature. This is *literally* not a divide that
gives any useful output. Don't try to make it look like it does.

              Linus