On 29/07/2024 10:37 pm, Stefano Stabellini wrote:
> On Mon, 29 Jul 2024, Julien Grall wrote:
>> Hi,
>>
>> On 29/07/2024 15:24, Stewart Hildebrand wrote:
>>> This series introduces stack check instrumentation on Arm. This is
>>> helpful for safety certification efforts. I'm sending this in an RFC
>>> state because I wanted to gather opinions on the approach of using
>>> -finstrument-functions.
>> This looks ok for an initial approach. I wonder if longer term we want to
>> implement stack guards on Arm. We would need to allocate an extra "virtual"
>> page per stack that would be unmapped.
>>
>> The advantage is this could be used also in production and doesn't rely on any
>> support from the processor.
>>
>> Any thoughts?
> I think we need both. We should implement stack guards on Arm. In
> addition, it is also beneficial to have -finstrument-functions for
> profiling, debugging, and also so that we can retrieve detailed call
> graphs from execution runs. As an example, -finstrument-functions can
> help with offline analysis to prove that we don't have unbounded
> recursion, on both arm and x86 too. On the other hand, stack guards help
> with protecting the stack in production.
x86 unconditionally has stack guards. It was easier doing it this way
than to have conditional stack guards and conditional shadow stacks
(hardware CET stacks).
In most cases, the #DF handler will recognise hitting a guard page and
panic() with information about the stack overflow.
The one case where this wont happen is on AMD hardware in HVM(SVM)
context, because the SVM designers and the AMD64 designers didn't talk
to each other when developing the respective features[1]...
In this case, Xen will take a clean[2] triple fault and reset on stack
overflow.
~Andrew
[1] The Task Register isn't switched on VMRUN, and while this was fine
in 32bit CPUs, it wasn't fine when 64bit CPUs replaced Task Switches
with the Interrupt Stack Table mechanism. It should be fixed when AMD
implement FRED support. For current CPUs, we can in principle fix this,
but at added latency to vmentry/exit, which is why we've not done so.
If someone feels like adding a debug mode for it, I'm sure that would be
acceptable.
[2] If you can call such a thing "clean", but it won't execute off into
the weeds.
[3] (tangent from another thread). 14k of stack!?!. x86 has 8k and
we've only ever hit that in error cases...