[PATCH] consolidate do_bug_frame() / bug_fn_t

Jan Beulich posted 1 patch 3 months, 2 weeks ago
Failed in applying to current master (apply log)
[PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Jan Beulich 3 months, 2 weeks ago
The type not being used in do_bug_frame() is suspicious. Apparently
that's solely because the type uses a pointer-to-const parameter,
when run_in_exception_handler() wants functions taking pointer-to-non-
const. Drop the const, in turn requiring Arm's do_bug_frame() to also
have its const dropped. This then brings that function also closer to
the common one, with Arm's use of vaddr_t remaining as a difference.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This is an alternative proposal to
https://lists.xen.org/archives/html/xen-devel/2023-12/msg01385.html,
albeit without paving a road towards Andrew's desire of getting rid of
show_execution_state_nonconst() again. Retaining (and propagating) the
const would imply the need to cast away the const-ness somewhere on (at
least) the path to invoking gdb stub code. Personally I'm averse to such
casting away of const-ness ...

--- a/xen/arch/arm/include/asm/traps.h
+++ b/xen/arch/arm/include/asm/traps.h
@@ -69,7 +69,7 @@ void do_cp(struct cpu_user_regs *regs, c
 void do_trap_smc(struct cpu_user_regs *regs, const union hsr hsr);
 void do_trap_hvc_smccc(struct cpu_user_regs *regs);
 
-int do_bug_frame(const struct cpu_user_regs *regs, vaddr_t pc);
+int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc);
 
 void noreturn do_unexpected_trap(const char *msg,
                                  const struct cpu_user_regs *regs);
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1202,7 +1202,7 @@ void do_unexpected_trap(const char *msg,
     panic("CPU%d: Unexpected Trap: %s\n", smp_processor_id(), msg);
 }
 
-int do_bug_frame(const struct cpu_user_regs *regs, vaddr_t pc)
+int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
 {
     const struct bug_frame *bug = NULL;
     const char *prefix = "", *filename, *predicate;
--- a/xen/common/bug.c
+++ b/xen/common/bug.c
@@ -63,14 +63,10 @@ int do_bug_frame(struct cpu_user_regs *r
 
     if ( id == BUGFRAME_run_fn )
     {
-        void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
+        bug_fn_t *fn = bug_ptr(bug);
 
         fn(regs);
 
-        /* Re-enforce consistent types, because of the casts involved. */
-        if ( false )
-            run_in_exception_handler(fn);
-
         return id;
     }
 
--- a/xen/include/xen/bug.h
+++ b/xen/include/xen/bug.h
@@ -101,12 +101,11 @@ struct bug_frame {
 #endif
 
 struct cpu_user_regs;
-typedef void bug_fn_t(const struct cpu_user_regs *regs);
+typedef void bug_fn_t(struct cpu_user_regs *regs);
 
 #ifndef run_in_exception_handler
 
-static void always_inline run_in_exception_handler(
-    void (*fn)(struct cpu_user_regs *regs))
+static void always_inline run_in_exception_handler(bug_fn_t *fn)
 {
     BUG_FRAME(BUGFRAME_run_fn, 0, fn, 0, NULL);
 }
Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Stefano Stabellini 3 months ago
Hi Jan, Andrew,

I managed to get back to read the mailing list and noticed this patch.

Is it still relevant and needs to be reviewed?

Are there any outstanding disagreements between maintainers on the
approach to take here?  Or should I just go ahead and review it?


On Tue, 9 Jan 2024, Jan Beulich wrote:
> The type not being used in do_bug_frame() is suspicious. Apparently
> that's solely because the type uses a pointer-to-const parameter,
> when run_in_exception_handler() wants functions taking pointer-to-non-
> const. Drop the const, in turn requiring Arm's do_bug_frame() to also
> have its const dropped. This then brings that function also closer to
> the common one, with Arm's use of vaddr_t remaining as a difference.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> This is an alternative proposal to
> https://lists.xen.org/archives/html/xen-devel/2023-12/msg01385.html,
> albeit without paving a road towards Andrew's desire of getting rid of
> show_execution_state_nonconst() again. Retaining (and propagating) the
> const would imply the need to cast away the const-ness somewhere on (at
> least) the path to invoking gdb stub code. Personally I'm averse to such
> casting away of const-ness ...
> 
> --- a/xen/arch/arm/include/asm/traps.h
> +++ b/xen/arch/arm/include/asm/traps.h
> @@ -69,7 +69,7 @@ void do_cp(struct cpu_user_regs *regs, c
>  void do_trap_smc(struct cpu_user_regs *regs, const union hsr hsr);
>  void do_trap_hvc_smccc(struct cpu_user_regs *regs);
>  
> -int do_bug_frame(const struct cpu_user_regs *regs, vaddr_t pc);
> +int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc);
>  
>  void noreturn do_unexpected_trap(const char *msg,
>                                   const struct cpu_user_regs *regs);
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1202,7 +1202,7 @@ void do_unexpected_trap(const char *msg,
>      panic("CPU%d: Unexpected Trap: %s\n", smp_processor_id(), msg);
>  }
>  
> -int do_bug_frame(const struct cpu_user_regs *regs, vaddr_t pc)
> +int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
>  {
>      const struct bug_frame *bug = NULL;
>      const char *prefix = "", *filename, *predicate;
> --- a/xen/common/bug.c
> +++ b/xen/common/bug.c
> @@ -63,14 +63,10 @@ int do_bug_frame(struct cpu_user_regs *r
>  
>      if ( id == BUGFRAME_run_fn )
>      {
> -        void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
> +        bug_fn_t *fn = bug_ptr(bug);
>  
>          fn(regs);
>  
> -        /* Re-enforce consistent types, because of the casts involved. */
> -        if ( false )
> -            run_in_exception_handler(fn);
> -
>          return id;
>      }
>  
> --- a/xen/include/xen/bug.h
> +++ b/xen/include/xen/bug.h
> @@ -101,12 +101,11 @@ struct bug_frame {
>  #endif
>  
>  struct cpu_user_regs;
> -typedef void bug_fn_t(const struct cpu_user_regs *regs);
> +typedef void bug_fn_t(struct cpu_user_regs *regs);
>  
>  #ifndef run_in_exception_handler
>  
> -static void always_inline run_in_exception_handler(
> -    void (*fn)(struct cpu_user_regs *regs))
> +static void always_inline run_in_exception_handler(bug_fn_t *fn)
>  {
>      BUG_FRAME(BUGFRAME_run_fn, 0, fn, 0, NULL);
>  }
>
Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Jan Beulich 3 months ago
On 24.01.2024 02:34, Stefano Stabellini wrote:
> I managed to get back to read the mailing list and noticed this patch.
> 
> Is it still relevant and needs to be reviewed?
> 
> Are there any outstanding disagreements between maintainers on the
> approach to take here?  Or should I just go ahead and review it?

It is still relevant from my pov, and everything that may be controversial
is said ...

> On Tue, 9 Jan 2024, Jan Beulich wrote:
>> The type not being used in do_bug_frame() is suspicious. Apparently
>> that's solely because the type uses a pointer-to-const parameter,
>> when run_in_exception_handler() wants functions taking pointer-to-non-
>> const. Drop the const, in turn requiring Arm's do_bug_frame() to also
>> have its const dropped. This then brings that function also closer to
>> the common one, with Arm's use of vaddr_t remaining as a difference.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> This is an alternative proposal to
>> https://lists.xen.org/archives/html/xen-devel/2023-12/msg01385.html,
>> albeit without paving a road towards Andrew's desire of getting rid of
>> show_execution_state_nonconst() again. Retaining (and propagating) the
>> const would imply the need to cast away the const-ness somewhere on (at
>> least) the path to invoking gdb stub code. Personally I'm averse to such
>> casting away of const-ness ...

... here. Without Andrew commenting, I'm afraid it's not actually clear
whether he objects to this approach, or is meaning to tolerate it
silently, or actually views it as a step in a good direction, even if
not quite getting where earlier on he thought we may want to move to.

Jan
Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Andrew Cooper 3 months ago
On 24/01/2024 7:28 am, Jan Beulich wrote:
> On 24.01.2024 02:34, Stefano Stabellini wrote:
>> I managed to get back to read the mailing list and noticed this patch.
>>
>> Is it still relevant and needs to be reviewed?
>>
>> Are there any outstanding disagreements between maintainers on the
>> approach to take here?  Or should I just go ahead and review it?
> It is still relevant from my pov, and everything that may be controversial
> is said ...

BUGFRAME_* cannot legitimately modify the interrupted context.  Two are
fatal paths, and other two are side-effect-less as far as C can tell.

So the infrastructure ought to take a const pointer.

The reason why this pointer is non-const is to do with the interaction
of the serial and keyhandler infrastructures.  Because we're adjusting
that for other reasons, I was hoping it would subsequently be easy to
switch Xen to being properly const in this regard.

Turns out it is:

 
https://gitlab.com/xen-project/people/andyhhp/xen/-/commit/4f857075005da1d28632e4f9198c2e7d0f404b9a

with a couple of caveats.  (Only the buster-gcc-ibt run failed, so I've
got some cf_check-ing to adjust, but all the other builds worked fine).


To make the serial code compile, I ended up having to revert patch 2 of
the regs series, which I believe is safe to do following patch 3-5 which
un-plumb the regs pointer deeper in the call chain.  If this is turns
out to be true, then the patch ought to be added and reverted in the
same series so it isn't left hanging about after the fact.

The _$X_poll() functions are used in timer context, which means there's
an outer regs context already latched, and that's arguably a better
context to use anyway for 'd'.

This in turn allows us to remove a #UD from a fast(ish) path, and remove
some per-cpu or static variables which are just used for non-standard
parameter passing because run_in_exception_handler() doesn't let you
pass any.


This leaves the '%' debugger infrastructure.  Being a debugger, it's
making arbitrary changes anyway and I'd much rather cast away constness
for a debugger, than to keep everything else mutable when it oughtn't to be.

If absolutely nothing else, registration and handling '%' ought to be
from x86 code rather than common code, which would remove the
do_debugger_trap_fatal() layering violation.

But, the more I look into the gdbstub the more I'm convinced that it
doesn't work.  For example, this gem:

/* Resuming after we've stopped used to work, but more through luck than
any actual intention.  It doesn't at the moment. */

From c/s b69f92f3012 in July 2004, and more specifically the commit
which added the gdbstub functionality to begin with.  I.e. it doesn't
appear to have ever supported more than "poke around in the crashed
state".  In the 2 decades that noone has fixed this, we've gained far
better technologies for doing this, such as running it in a VM.

I am going to submit some patches deleting gdbstub.  It clearly had not
much value to begin with, and is not definitely not worth the problems
it is creating in adjacent code these days.

~Andrew

Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Jan Beulich 3 months ago
On 25.01.2024 02:14, Andrew Cooper wrote:
> On 24/01/2024 7:28 am, Jan Beulich wrote:
>> On 24.01.2024 02:34, Stefano Stabellini wrote:
>>> I managed to get back to read the mailing list and noticed this patch.
>>>
>>> Is it still relevant and needs to be reviewed?
>>>
>>> Are there any outstanding disagreements between maintainers on the
>>> approach to take here?  Or should I just go ahead and review it?
>> It is still relevant from my pov, and everything that may be controversial
>> is said ...
> 
> BUGFRAME_* cannot legitimately modify the interrupted context.  Two are
> fatal paths, and other two are side-effect-less as far as C can tell.
> 
> So the infrastructure ought to take a const pointer.
> 
> The reason why this pointer is non-const is to do with the interaction
> of the serial and keyhandler infrastructures.  Because we're adjusting
> that for other reasons, I was hoping it would subsequently be easy to
> switch Xen to being properly const in this regard.
> 
> Turns out it is:
> 
>  
> https://gitlab.com/xen-project/people/andyhhp/xen/-/commit/4f857075005da1d28632e4f9198c2e7d0f404b9a
> 
> with a couple of caveats.  (Only the buster-gcc-ibt run failed, so I've
> got some cf_check-ing to adjust, but all the other builds worked fine).
> 
> 
> To make the serial code compile, I ended up having to revert patch 2 of
> the regs series, which I believe is safe to do following patch 3-5 which
> un-plumb the regs pointer deeper in the call chain.  If this is turns
> out to be true, then the patch ought to be added and reverted in the
> same series so it isn't left hanging about after the fact.

Looking further into this, I can't see how reverting ought to be possible,
even less so specifically after patch 5. Patches 4 and 5 merely eliminate
now unused parameters. Hence imo if it could be reverted after 5, it also
ought to be fine to revert after 3. Which in turn it would mean it's not
needed at all. Which I simply cannot see (yet?).

Jan

Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Jan Beulich 3 months ago
On 25.01.2024 02:14, Andrew Cooper wrote:
> On 24/01/2024 7:28 am, Jan Beulich wrote:
>> On 24.01.2024 02:34, Stefano Stabellini wrote:
>>> I managed to get back to read the mailing list and noticed this patch.
>>>
>>> Is it still relevant and needs to be reviewed?
>>>
>>> Are there any outstanding disagreements between maintainers on the
>>> approach to take here?  Or should I just go ahead and review it?
>> It is still relevant from my pov, and everything that may be controversial
>> is said ...
> 
> BUGFRAME_* cannot legitimately modify the interrupted context.  Two are
> fatal paths, and other two are side-effect-less as far as C can tell.
> 
> So the infrastructure ought to take a const pointer.
> 
> The reason why this pointer is non-const is to do with the interaction
> of the serial and keyhandler infrastructures.  Because we're adjusting
> that for other reasons, I was hoping it would subsequently be easy to
> switch Xen to being properly const in this regard.
> 
> Turns out it is:
> 
>  
> https://gitlab.com/xen-project/people/andyhhp/xen/-/commit/4f857075005da1d28632e4f9198c2e7d0f404b9a
> 
> with a couple of caveats.  (Only the buster-gcc-ibt run failed, so I've
> got some cf_check-ing to adjust, but all the other builds worked fine).
> 
> 
> To make the serial code compile, I ended up having to revert patch 2 of
> the regs series, which I believe is safe to do following patch 3-5 which
> un-plumb the regs pointer deeper in the call chain.  If this is turns
> out to be true, then the patch ought to be added and reverted in the
> same series so it isn't left hanging about after the fact.

Hmm, I'm not sure I see how reverting that would end up working. However,
aiui you need to revert primarily for the non-const-ness of the pointers
involved in [gs]et_irq_regs(). I wonder whether, if we followed your
underlying thought here, those shouldn't be const-ified then anyway.

> The _$X_poll() functions are used in timer context, which means there's
> an outer regs context already latched, and that's arguably a better
> context to use anyway for 'd'.

If the timer happens to run on an idle vCPU, what "outer regs context"
would we have there?

> This in turn allows us to remove a #UD from a fast(ish) path, and remove
> some per-cpu or static variables which are just used for non-standard
> parameter passing because run_in_exception_handler() doesn't let you
> pass any.
> 
> 
> This leaves the '%' debugger infrastructure.  Being a debugger, it's
> making arbitrary changes anyway and I'd much rather cast away constness
> for a debugger, than to keep everything else mutable when it oughtn't to be.
> 
> If absolutely nothing else, registration and handling '%' ought to be
> from x86 code rather than common code, which would remove the
> do_debugger_trap_fatal() layering violation.
> 
> But, the more I look into the gdbstub the more I'm convinced that it
> doesn't work.  For example, this gem:
> 
> /* Resuming after we've stopped used to work, but more through luck than
> any actual intention.  It doesn't at the moment. */
> 
> From c/s b69f92f3012 in July 2004, and more specifically the commit
> which added the gdbstub functionality to begin with.  I.e. it doesn't
> appear to have ever supported more than "poke around in the crashed
> state".  In the 2 decades that noone has fixed this, we've gained far
> better technologies for doing this, such as running it in a VM.
> 
> I am going to submit some patches deleting gdbstub.  It clearly had not
> much value to begin with, and is not definitely not worth the problems
> it is creating in adjacent code these days.

All fine. Still I wonder whether in the meantime this patch isn't an
improvement on its own, and hence whether the const couldn't sensibly
be added subsequently.

Jan

Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Andrew Cooper 3 months ago
Answering out of order:

On 25/01/2024 8:15 am, Jan Beulich wrote:
> All fine. Still I wonder whether in the meantime this patch isn't an
> improvement on its own, and hence whether the const couldn't sensibly
> be added subsequently.

We have a while until 4.19 goes out.  I would prefer to try and get this
untangled properly, because half of your patch is in the opposite
direction for getting the const-ness working.

If we start to hit rc1 and the const-ness isn't complete, we can revisit.


Regarding the removal of gdb-stub.  I'd like to get that done, and
rebase the remainder of the IRQ series over it, because it will reduce
the churn in your IRQ series.  I'm happy to do the rebase if you want.


> On 25.01.2024 02:14, Andrew Cooper wrote:
>> To make the serial code compile, I ended up having to revert patch 2 of
>> the regs series, which I believe is safe to do following patch 3-5 which
>> un-plumb the regs pointer deeper in the call chain.  If this is turns
>> out to be true, then the patch ought to be added and reverted in the
>> same series so it isn't left hanging about after the fact.
> Hmm, I'm not sure I see how reverting that would end up working. However,
> aiui you need to revert primarily for the non-const-ness of the pointers
> involved in [gs]et_irq_regs(). I wonder whether, if we followed your
> underlying thought here, those shouldn't be const-ified then anyway.
>
>> The _$X_poll() functions are used in timer context, which means there's
>> an outer regs context already latched, and that's arguably a better
>> context to use anyway for 'd'.
> If the timer happens to run on an idle vCPU, what "outer regs context"
> would we have there?

The only reason the serial infrastructure was setting up a fake IRQ
context was because it was using run_in_exception_handler().

But I (think I have) removed that fully (and it simplifies things more
than I was hoping).

With '%' deleted, it's only 'd' that cares, isn't it?  And that's "dump
the current regs", rather than wanting something else.

~Andrew

Re: [PATCH] consolidate do_bug_frame() / bug_fn_t
Posted by Jan Beulich 3 months ago
On 25.01.2024 13:10, Andrew Cooper wrote:
> Answering out of order:
> 
> On 25/01/2024 8:15 am, Jan Beulich wrote:
>> All fine. Still I wonder whether in the meantime this patch isn't an
>> improvement on its own, and hence whether the const couldn't sensibly
>> be added subsequently.
> 
> We have a while until 4.19 goes out.  I would prefer to try and get this
> untangled properly, because half of your patch is in the opposite
> direction for getting the const-ness working.
> 
> If we start to hit rc1 and the const-ness isn't complete, we can revisit.
> 
> 
> Regarding the removal of gdb-stub.  I'd like to get that done, and
> rebase the remainder of the IRQ series over it, because it will reduce
> the churn in your IRQ series.  I'm happy to do the rebase if you want.

Shouldn't be overly difficult, so I guess I can as well do it before
(eventually) re-submitting.

>> On 25.01.2024 02:14, Andrew Cooper wrote:
>>> To make the serial code compile, I ended up having to revert patch 2 of
>>> the regs series, which I believe is safe to do following patch 3-5 which
>>> un-plumb the regs pointer deeper in the call chain.  If this is turns
>>> out to be true, then the patch ought to be added and reverted in the
>>> same series so it isn't left hanging about after the fact.
>> Hmm, I'm not sure I see how reverting that would end up working. However,
>> aiui you need to revert primarily for the non-const-ness of the pointers
>> involved in [gs]et_irq_regs(). I wonder whether, if we followed your
>> underlying thought here, those shouldn't be const-ified then anyway.
>>
>>> The _$X_poll() functions are used in timer context, which means there's
>>> an outer regs context already latched, and that's arguably a better
>>> context to use anyway for 'd'.
>> If the timer happens to run on an idle vCPU, what "outer regs context"
>> would we have there?
> 
> The only reason the serial infrastructure was setting up a fake IRQ
> context was because it was using run_in_exception_handler().
> 
> But I (think I have) removed that fully (and it simplifies things more
> than I was hoping).
> 
> With '%' deleted, it's only 'd' that cares, isn't it?

Yes.

>  And that's "dump the current regs", rather than wanting something else.

Question remains though: What are "current regs" when on an idle vCPU?
But perhaps that will clarify itself once I see how you "removed that
fully".

Jan