From: Thomas Gleixner > Sent: 17 July 2022 00:17 > Folks! > > Back in the good old spectre v2 days (2018) we decided to not use > IBRS. In hindsight this might have been the wrong decision because it did > not force people to come up with alternative approaches. > > It was already discussed back then to try software based call depth > accounting and RSB stuffing on underflow for Intel SKL[-X] systems to avoid > the insane overhead of IBRS. > > This has been tried in 2018 and was rejected due to the massive overhead > and other shortcomings of the approach to put the accounting into each > function prologue: > > 1) Text size increase which is inflicted on everyone. While CPUs are > good in ignoring NOPs they still pollute the I-cache. > > 2) That results in tail call over-accounting which can be exploited. > > Disabling tail calls is not an option either and adding a 10 byte padding > in front of every direct call is even worse in terms of text size and > I-cache impact. We also could patch calls past the accounting in the > function prologue but that becomes a nightmare vs. ENDBR. > > As IBRS is a performance horror show, Peter Zijstra and me revisited the > call depth tracking approach and implemented it in a way which is hopefully > more palatable and avoids the downsides of the original attempt. > > We both unsurprisingly hate the result with a passion... > > The way we approached this is: > > 1) objtool creates a list of function entry points and a list of direct > call sites into new sections which can be discarded after init. > > 2) On affected machines, use the new sections, allocate module memory > and create a call thunk per function (16 bytes without > debug/statistics). Then patch all direct calls to invoke the thunk, > which does the call accounting and then jumps to the original call > site. > > 3) Utilize the retbleed return thunk mechanism by making the jump > target run-time configurable. Add the accounting counterpart and > stuff RSB on underflow in that alternate implementation. What happens to indirect calls? The above would imply that they miss the function entry thunk, but get the return one. Won't this lead to mis-counting of the RSB? I also thought that retpolines would trash the return stack? Using a single retpoline thunk would pretty much ensure that they are never correctly predicted from the BTB, but it only gives a single BTB entry that needs 'setting up' to get mis- prediction. I'm also sure I managed to infer from a document of instruction timings and architectures that some x86 cpu actually used the BTB for normal conditional jumps? Possibly to avoid passing the full %ip value all down the cpu pipeline. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sun, Jul 17 2022 at 09:45, David Laight wrote:
> From: Thomas Gleixner
>>
>> 3) Utilize the retbleed return thunk mechanism by making the jump
>> target run-time configurable. Add the accounting counterpart and
>> stuff RSB on underflow in that alternate implementation.
>
> What happens to indirect calls?
> The above would imply that they miss the function entry thunk, but
> get the return one.
> Won't this lead to mis-counting of the RSB?
That's accounted in the indirect call thunk. This mitigation requires
retpolines enabled.
> I also thought that retpolines would trash the return stack?
No. They prevent that the CPU misspeculates an indirect call due to a
mistrained BTB.
> Using a single retpoline thunk would pretty much ensure that
> they are never correctly predicted from the BTB, but it only
> gives a single BTB entry that needs 'setting up' to get mis-
> prediction.
BTB != RSB
The intra function call in the retpoline is of course adding a RSB entry
which points to the speculation trap, but that gets popped immediately
after that by the return which goes to the called function.
But that does not prevent the RSB underflow problem. As I described the
RSB is a stack with depth 16. Call pushs, ret pops. So if speculation is
ahead and emptied the RSB while speculating down the rets then the next
speculated RET will fall back to other prediction mechanism which is
what the SKL specific retbleed variant exploits via BHB mistraining.
> I'm also sure I managed to infer from a document of instruction
> timings and architectures that some x86 cpu actually used the BTB
> for normal conditional jumps?
That's relevant to the problem at hand in which way?
Thanks,
tglx
From: Thomas Gleixner > Sent: 17 July 2022 16:07 > > On Sun, Jul 17 2022 at 09:45, David Laight wrote: > > From: Thomas Gleixner > >> > >> 3) Utilize the retbleed return thunk mechanism by making the jump > >> target run-time configurable. Add the accounting counterpart and > >> stuff RSB on underflow in that alternate implementation. > > > > What happens to indirect calls? > > The above would imply that they miss the function entry thunk, but > > get the return one. > > Won't this lead to mis-counting of the RSB? > > That's accounted in the indirect call thunk. This mitigation requires > retpolines enabled. Thanks, that wasn't in the summary. > > I also thought that retpolines would trash the return stack? > > No. They prevent that the CPU misspeculates an indirect call due to a > mistrained BTB. > > > Using a single retpoline thunk would pretty much ensure that > > they are never correctly predicted from the BTB, but it only > > gives a single BTB entry that needs 'setting up' to get mis- > > prediction. > > BTB != RSB I was thinking about what happens after the RSB has underflowed. Which is when (I presume) the BTB based speculation happens. > The intra function call in the retpoline is of course adding a RSB entry > which points to the speculation trap, but that gets popped immediately > after that by the return which goes to the called function. I'm remembering the 'active' instructions in a retpoline being 'push; ret'. Which is an RSB imbalance. ... > > I'm also sure I managed to infer from a document of instruction > > timings and architectures that some x86 cpu actually used the BTB > > for normal conditional jumps? > > That's relevant to the problem at hand in which way? The next problem :-) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sun, Jul 17 2022 at 17:56, David Laight wrote:
> From: Thomas Gleixner
>> On Sun, Jul 17 2022 at 09:45, David Laight wrote:
> I was thinking about what happens after the RSB has underflowed.
> Which is when (I presume) the BTB based speculation happens.
>
>> The intra function call in the retpoline is of course adding a RSB entry
>> which points to the speculation trap, but that gets popped immediately
>> after that by the return which goes to the called function.
>
> I'm remembering the 'active' instructions in a retpoline being 'push; ret'.
> Which is an RSB imbalance.
Looking at the code might help to remember correctly:
call 1f
speculation trap
1: mov %reg, %rsp
ret
Thanks,
tglx
© 2016 - 2026 Red Hat, Inc.