xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

[Xen-devel] [PATCH] xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

Posted by Juergen Gross 4 years, 9 months ago

Continuing on the stack saved by __prepare_to_wait() on the wrong cpu
is rather dangerous.

Instead of doing so just call the scheduler again as it already is
happening in the similar case in __prepare_to_wait() when doing the
setjmp() would be wrong.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/wait.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/common/wait.c b/xen/common/wait.c
index 3fc5f68611..24716e7676 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -196,6 +196,11 @@ void check_wakeup_from_wait(void)
     {
         gdprintk(XENLOG_ERR, "vcpu affinity lost\n");
         domain_crash(curr->domain);
+
+        /* Re-initiate scheduler and don't longjmp(). */
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        for ( ; ; )
+            do_softirq();
     }
 
     /*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

Posted by Andrew Cooper 4 years, 9 months ago

On 29/07/2019 05:36, Juergen Gross wrote:
> Continuing on the stack saved by __prepare_to_wait() on the wrong cpu
> is rather dangerous.
>
> Instead of doing so just call the scheduler again as it already is
> happening in the similar case in __prepare_to_wait() when doing the
> setjmp() would be wrong.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

I'm afraid this is still problematic.  By successfully invoking the
waitqueue, we know that no spinlocks were held, but we have no guarantee
that e.g. an xmalloc()'d pointer is still only stashed in the stack.

The original behaviour of this code, on discovering a mismatched
affinity, was to set the affinity back to what it should be and try
again.  The purpose of this is to try and ensure that we (eventually)
can successfully longjmp() and unwind the stack.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

Posted by Juergen Gross 4 years, 9 months ago

On 29.07.19 10:34, Andrew Cooper wrote:
> On 29/07/2019 05:36, Juergen Gross wrote:
>> Continuing on the stack saved by __prepare_to_wait() on the wrong cpu
>> is rather dangerous.
>>
>> Instead of doing so just call the scheduler again as it already is
>> happening in the similar case in __prepare_to_wait() when doing the
>> setjmp() would be wrong.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> I'm afraid this is still problematic.  By successfully invoking the
> waitqueue, we know that no spinlocks were held, but we have no guarantee
> that e.g. an xmalloc()'d pointer is still only stashed in the stack.

But how are the domain_crash() invocations with following do_softirq()
calls in the __prepare_to_wait() case fine then? At the place where
either doing setjmp() or longjmp() it should be okay to throw away the
current stack, or otherwise there would be inconsistencies.

So either there is already a problem in the code regarding how the
domain crashing is done in __prepare_to_wait(), or my solution is just
fine, as it is just expanding the current behavior.

Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

Posted by Andrew Cooper 4 years, 9 months ago

On 29/07/2019 09:49, Juergen Gross wrote:
> On 29.07.19 10:34, Andrew Cooper wrote:
>> On 29/07/2019 05:36, Juergen Gross wrote:
>>> Continuing on the stack saved by __prepare_to_wait() on the wrong cpu
>>> is rather dangerous.
>>>
>>> Instead of doing so just call the scheduler again as it already is
>>> happening in the similar case in __prepare_to_wait() when doing the
>>> setjmp() would be wrong.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> I'm afraid this is still problematic.  By successfully invoking the
>> waitqueue, we know that no spinlocks were held, but we have no guarantee
>> that e.g. an xmalloc()'d pointer is still only stashed in the stack.
>
> But how are the domain_crash() invocations with following do_softirq()
> calls in the __prepare_to_wait() case fine then?

You make a very good point.

Seeing as this patch returns it to "no worse than before", and this code
isn't long for the world anyway, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel