[PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp

Daniel Henrique Barboza posted 9 patches 2 years, 2 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220307191553.429236-1-danielhb413@gmail.com
Maintainers: "Cédric Le Goater" <clg@kaod.org>, Daniel Henrique Barboza <danielhb413@gmail.com>, David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>
hw/ppc/spapr_hcall.c   | 98 +++++++++++++++++++++++++++++++++++++++++-
include/hw/ppc/spapr.h |  3 ++
include/qemu/log.h     |  3 +-
util/log.c             |  2 +
4 files changed, 103 insertions(+), 3 deletions(-)
[PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Daniel Henrique Barboza 2 years, 2 months ago
Hi,

I got a lot of noise trying to debug an AIX guest in a pseries machine when running with 
'-d unimp'. The reason is that there is no distinction between features
(in my case, hypercalls) that are unimplemented because we never considered,
versus features that we made a design choice not to implement.

This series adds a new log type, LOG_UNSUPP, as a way to filter the
second case. After changing the log level of existing unsupported
pseries hypercalls, -d unimp was reporting just the ones that I need to
worry about and decide whether we should implement it or mark as
unsupported in our model. After this series there's still one hypercall
thgat is being thrown by AIX. We'll deal with that another day.

I am posting this a little too close of the soft freeze. If we manage
to get it reviewed in time, since it's a debug QoL change that doesn't
break anything, I'd say we take it. 

Also, I didn't find the maintainer for the util/log.c file. Let me know
if I need to CC someone in special for this file.

Daniel Henrique Barboza (9):
  util/log.c: add LOG_UNSUPP type
  hw/ppc/spapr_hcall.c: log h_clean_slb() as unsupported
  hw/ppc/spapr_hcall.c: log h_invalidate_pid() as unsupported
  hw/ppc/spapr_hcall.c: log h_copy_tofrom_guest() as unsupported
  hw/ppc/spapr_hcall.c: log H_GET_EM_PARMS as unsupported
  hw/ppc/spapr_hcall.c: log H_BEST_ENERGY as unsupported
  hw/ppc/spapr_hcall.c: log H_QUERY_VAS_CAPABILITIES as unsupported
  hw/ppc/spapr_hcall.c: log H_GET_PPP as unsupported
  hw/ppc/spapr_hcall.c: log H_VIOCTL as unsupported

 hw/ppc/spapr_hcall.c   | 98 +++++++++++++++++++++++++++++++++++++++++-
 include/hw/ppc/spapr.h |  3 ++
 include/qemu/log.h     |  3 +-
 util/log.c             |  2 +
 4 files changed, 103 insertions(+), 3 deletions(-)

-- 
2.35.1
Re: [PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Peter Maydell 2 years, 2 months ago
On Mon, 7 Mar 2022 at 19:19, Daniel Henrique Barboza
<danielhb413@gmail.com> wrote:
>
> Hi,
>
> I got a lot of noise trying to debug an AIX guest in a pseries machine when running with
> '-d unimp'. The reason is that there is no distinction between features
> (in my case, hypercalls) that are unimplemented because we never considered,
> versus features that we made a design choice not to implement.
>
> This series adds a new log type, LOG_UNSUPP, as a way to filter the
> second case. After changing the log level of existing unsupported
> pseries hypercalls, -d unimp was reporting just the ones that I need to
> worry about and decide whether we should implement it or mark as
> unsupported in our model. After this series there's still one hypercall
> thgat is being thrown by AIX. We'll deal with that another day.

So the intention of the distinction is:
  LOG_UNIMP: we don't implement this, but we should
  LOG_UNSUPP: we don't implement this, and that's OK because it's optional

?

I think I'd be happier about adding a new log category if we had
some examples of where we should be using it other than just in
the spapr hcall code, to indicate that it's a bit more broadly
useful. If this is a distinction that only makes sense for that
narrow use case, then as Philippe says a tracepoint might be a
better choice.

thanks
-- PMM
Re: [PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Daniel Henrique Barboza 2 years, 2 months ago

On 3/7/22 17:21, Peter Maydell wrote:
> On Mon, 7 Mar 2022 at 19:19, Daniel Henrique Barboza
> <danielhb413@gmail.com> wrote:
>>
>> Hi,
>>
>> I got a lot of noise trying to debug an AIX guest in a pseries machine when running with
>> '-d unimp'. The reason is that there is no distinction between features
>> (in my case, hypercalls) that are unimplemented because we never considered,
>> versus features that we made a design choice not to implement.
>>
>> This series adds a new log type, LOG_UNSUPP, as a way to filter the
>> second case. After changing the log level of existing unsupported
>> pseries hypercalls, -d unimp was reporting just the ones that I need to
>> worry about and decide whether we should implement it or mark as
>> unsupported in our model. After this series there's still one hypercall
>> thgat is being thrown by AIX. We'll deal with that another day.
> 
> So the intention of the distinction is:
>    LOG_UNIMP: we don't implement this, but we should
>    LOG_UNSUPP: we don't implement this, and that's OK because it's optional
> 
> ?

The idea is that LOG_UNIMP is too broad and it's used to indicate features that are
unknown to QEMU and also features that QEMU knows about but does not support it. It's
not necessarily a way of telling "we should implement this" but more like "we know/do
not know what this is".


> 
> I think I'd be happier about adding a new log category if we had
> some examples of where we should be using it other than just in
> the spapr hcall code, to indicate that it's a bit more broadly
> useful. If this is a distinction that only makes sense for that
> narrow use case, then as Philippe says a tracepoint might be a
> better choice.

target/arm/translate.c, do_coproc_insn():


     /* Unknown register; this might be a guest error or a QEMU
      * unimplemented feature.
      */
     if (is64) {
         qemu_log_mask(LOG_UNIMP, "%s access to unsupported AArch32 "
                       "64 bit system register cp:%d opc1: %d crm:%d "
                       "(%s)\n",
                       isread ? "read" : "write", cpnum, opc1, crm,
                       s->ns ? "non-secure" : "secure");
     } else {
         qemu_log_mask(LOG_UNIMP, "%s access to unsupported AArch32 "
                       "system register cp:%d opc1:%d crn:%d crm:%d opc2:%d "
                       "(%s)\n",
                       isread ? "read" : "write", cpnum, opc1, crn, crm, opc2,
                       s->ns ? "non-secure" : "secure");
     }

This use of LOG_UNIMP is logging something that we don't know about, it's unknown.
And hw/arm/smmuv3.c, decode_ste():


     if (STE_CFG_S2_ENABLED(config)) {
         qemu_log_mask(LOG_UNIMP, "SMMUv3 does not support stage 2 yet\n");
         goto bad_ste;
     }

     if (STE_S1CDMAX(ste) != 0) {
         qemu_log_mask(LOG_UNIMP,
                       "SMMUv3 does not support multiple context descriptors yet\n");
         goto bad_ste;
     }

     if (STE_S1STALLD(ste)) {
         qemu_log_mask(LOG_UNIMP,
                       "SMMUv3 S1 stalling fault model not allowed yet\n");
         goto bad_ste;
     }


This is something we know what it is and are deciding not to support it. Both are being
logged as LOG_UNIMP. This is the distinction I was trying to achieve with this new
log type. The example in decode_ste() could be logged as LOG_UNSUPP.


Perhaps an easier sell would be adding a "LOG_UNKNOWN" for the features/regs/etc that
we do not know about. The semantics/intention seems clearer than what I'm trying to
achieve with LOG_UNSUPP.


And yeah, if this is too convoluted to push forward I can always tracepoint like Philippe
suggested as well.



Thanks,


Daniel



> 
> thanks
> -- PMM
Re: [PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Peter Maydell 2 years, 1 month ago
On Mon, 7 Mar 2022 at 22:00, Daniel Henrique Barboza
<danielhb413@gmail.com> wrote:
>
>
>
> On 3/7/22 17:21, Peter Maydell wrote:
> > On Mon, 7 Mar 2022 at 19:19, Daniel Henrique Barboza
> > <danielhb413@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I got a lot of noise trying to debug an AIX guest in a pseries machine when running with
> >> '-d unimp'. The reason is that there is no distinction between features
> >> (in my case, hypercalls) that are unimplemented because we never considered,
> >> versus features that we made a design choice not to implement.
> >>
> >> This series adds a new log type, LOG_UNSUPP, as a way to filter the
> >> second case. After changing the log level of existing unsupported
> >> pseries hypercalls, -d unimp was reporting just the ones that I need to
> >> worry about and decide whether we should implement it or mark as
> >> unsupported in our model. After this series there's still one hypercall
> >> thgat is being thrown by AIX. We'll deal with that another day.
> >
> > So the intention of the distinction is:
> >    LOG_UNIMP: we don't implement this, but we should
> >    LOG_UNSUPP: we don't implement this, and that's OK because it's optional
> >
> > ?
>
> The idea is that LOG_UNIMP is too broad and it's used to indicate features that are
> unknown to QEMU and also features that QEMU knows about but does not support it. It's
> not necessarily a way of telling "we should implement this" but more like "we know/do
> not know what this is".

From the point of view of debugging the guest, I don't care
whether the QEMU developers know that they've not got round
to something or whether they've just forgotten it. I care
about "is this because I, the guest program, did something wrong,
or is it because QEMU is not completely emulating something
I should really be able to expect to be present". This is why we
distinguish LOG_UNIMP from LOG_GUEST_ERROR.

> > I think I'd be happier about adding a new log category if we had
> > some examples of where we should be using it other than just in
> > the spapr hcall code, to indicate that it's a bit more broadly
> > useful. If this is a distinction that only makes sense for that
> > narrow use case, then as Philippe says a tracepoint might be a
> > better choice.
>
> target/arm/translate.c, do_coproc_insn():

> This use of LOG_UNIMP is logging something that we don't know about, it's unknown.

(Some of the things that get logged here will really be things that
we conceptually "know about" and don't implement -- the logging
is a catch-all for any kind of unimplemented register, whether the
specs define it or not.)

> And hw/arm/smmuv3.c, decode_ste():

> This is something we know what it is and are deciding not to support it. Both are being
> logged as LOG_UNIMP. This is the distinction I was trying to achieve with this new
> log type. The example in decode_ste() could be logged as LOG_UNSUPP.

I don't see much benefit in distinguishing these two cases, to be
honest. You could maybe have sold me on "you're accessing something
that is optional and we happen not to provide it" vs "you're
accessing something that should be there and isn't", because that's
a distinction that guest code authors might plausibly care about.
To the extent that you want to helpfully say "this is because
QEMU doesn't implement an entire feature" you can say that in the
free-form text message.

-- PMM
Re: [PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Daniel Henrique Barboza 2 years, 1 month ago

On 3/8/22 06:18, Peter Maydell wrote:
> On Mon, 7 Mar 2022 at 22:00, Daniel Henrique Barboza
> <danielhb413@gmail.com> wrote:
>>
>>
>>
>> On 3/7/22 17:21, Peter Maydell wrote:
>>> On Mon, 7 Mar 2022 at 19:19, Daniel Henrique Barboza
>>> <danielhb413@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I got a lot of noise trying to debug an AIX guest in a pseries machine when running with
>>>> '-d unimp'. The reason is that there is no distinction between features
>>>> (in my case, hypercalls) that are unimplemented because we never considered,
>>>> versus features that we made a design choice not to implement.
>>>>
>>>> This series adds a new log type, LOG_UNSUPP, as a way to filter the
>>>> second case. After changing the log level of existing unsupported
>>>> pseries hypercalls, -d unimp was reporting just the ones that I need to
>>>> worry about and decide whether we should implement it or mark as
>>>> unsupported in our model. After this series there's still one hypercall
>>>> thgat is being thrown by AIX. We'll deal with that another day.
>>>
>>> So the intention of the distinction is:
>>>     LOG_UNIMP: we don't implement this, but we should
>>>     LOG_UNSUPP: we don't implement this, and that's OK because it's optional
>>>
>>> ?
>>
>> The idea is that LOG_UNIMP is too broad and it's used to indicate features that are
>> unknown to QEMU and also features that QEMU knows about but does not support it. It's
>> not necessarily a way of telling "we should implement this" but more like "we know/do
>> not know what this is".
> 
>  From the point of view of debugging the guest, I don't care
> whether the QEMU developers know that they've not got round
> to something or whether they've just forgotten it. I care
> about "is this because I, the guest program, did something wrong,
> or is it because QEMU is not completely emulating something
> I should really be able to expect to be present". This is why we
> distinguish LOG_UNIMP from LOG_GUEST_ERROR.
> 
>>> I think I'd be happier about adding a new log category if we had
>>> some examples of where we should be using it other than just in
>>> the spapr hcall code, to indicate that it's a bit more broadly
>>> useful. If this is a distinction that only makes sense for that
>>> narrow use case, then as Philippe says a tracepoint might be a
>>> better choice.
>>
>> target/arm/translate.c, do_coproc_insn():
> 
>> This use of LOG_UNIMP is logging something that we don't know about, it's unknown.
> 
> (Some of the things that get logged here will really be things that
> we conceptually "know about" and don't implement -- the logging
> is a catch-all for any kind of unimplemented register, whether the
> specs define it or not.)
> 
>> And hw/arm/smmuv3.c, decode_ste():
> 
>> This is something we know what it is and are deciding not to support it. Both are being
>> logged as LOG_UNIMP. This is the distinction I was trying to achieve with this new
>> log type. The example in decode_ste() could be logged as LOG_UNSUPP.
> 
> I don't see much benefit in distinguishing these two cases, to be
> honest. You could maybe have sold me on "you're accessing something
> that is optional and we happen not to provide it" vs "you're
> accessing something that should be there and isn't", because that's
> a distinction that guest code authors might plausibly care about.
> To the extent that you want to helpfully say "this is because
> QEMU doesn't implement an entire feature" you can say that in the
> free-form text message.


Fair enough. I'll rely on tracepointing the cases where I think this distinction
is helpful for debugging.



Thanks,


Daniel

> 
> -- PMM
Re: [PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Philippe Mathieu-Daudé 2 years, 1 month ago
On 8/3/22 10:18, Peter Maydell wrote:
> On Mon, 7 Mar 2022 at 22:00, Daniel Henrique Barboza
> <danielhb413@gmail.com> wrote:
>> On 3/7/22 17:21, Peter Maydell wrote:
>>> On Mon, 7 Mar 2022 at 19:19, Daniel Henrique Barboza
>>> <danielhb413@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I got a lot of noise trying to debug an AIX guest in a pseries machine when running with
>>>> '-d unimp'. The reason is that there is no distinction between features
>>>> (in my case, hypercalls) that are unimplemented because we never considered,
>>>> versus features that we made a design choice not to implement.
>>>>
>>>> This series adds a new log type, LOG_UNSUPP, as a way to filter the
>>>> second case. After changing the log level of existing unsupported
>>>> pseries hypercalls, -d unimp was reporting just the ones that I need to
>>>> worry about and decide whether we should implement it or mark as
>>>> unsupported in our model. After this series there's still one hypercall
>>>> thgat is being thrown by AIX. We'll deal with that another day.
>>>
>>> So the intention of the distinction is:
>>>     LOG_UNIMP: we don't implement this, but we should
>>>     LOG_UNSUPP: we don't implement this, and that's OK because it's optional
>>>
>>> ?
>>
>> The idea is that LOG_UNIMP is too broad and it's used to indicate features that are
>> unknown to QEMU and also features that QEMU knows about but does not support it. It's
>> not necessarily a way of telling "we should implement this" but more like "we know/do
>> not know what this is".
> 
>  From the point of view of debugging the guest, I don't care
> whether the QEMU developers know that they've not got round
> to something or whether they've just forgotten it. I care
> about "is this because I, the guest program, did something wrong,
> or is it because QEMU is not completely emulating something
> I should really be able to expect to be present". This is why we
> distinguish LOG_UNIMP from LOG_GUEST_ERROR.

I agree with this view. Another distinctions:

  * tracing API
    - have multiple backends to send the events
    - is optional, might be completely disabled in build
      (which is why we use it to debug or analyze perfs)

  * qemu_log() API
    - logs to stdout
    - is present in all build variants
      (so we can always look at guest misbehavior as
       Peter described).

LOG_UNSUPP doesn't add value wrt guest misbehavior IMO,
which is why I'd stick to trace events for this.

>>> I think I'd be happier about adding a new log category if we had
>>> some examples of where we should be using it other than just in
>>> the spapr hcall code, to indicate that it's a bit more broadly
>>> useful. If this is a distinction that only makes sense for that
>>> narrow use case, then as Philippe says a tracepoint might be a
>>> better choice.
>>
>> target/arm/translate.c, do_coproc_insn():
> 
>> This use of LOG_UNIMP is logging something that we don't know about, it's unknown.
> 
> (Some of the things that get logged here will really be things that
> we conceptually "know about" and don't implement -- the logging
> is a catch-all for any kind of unimplemented register, whether the
> specs define it or not.)
> 
>> And hw/arm/smmuv3.c, decode_ste():
> 
>> This is something we know what it is and are deciding not to support it. Both are being
>> logged as LOG_UNIMP. This is the distinction I was trying to achieve with this new
>> log type. The example in decode_ste() could be logged as LOG_UNSUPP.
> 
> I don't see much benefit in distinguishing these two cases, to be
> honest. You could maybe have sold me on "you're accessing something
> that is optional and we happen not to provide it" vs "you're
> accessing something that should be there and isn't", because that's
> a distinction that guest code authors might plausibly care about.
> To the extent that you want to helpfully say "this is because
> QEMU doesn't implement an entire feature" you can say that in the
> free-form text message.
> 
> -- PMM
>
Re: [PATCH 0/9] add LOG_UNSUPP log type + mark hcalls as unsupp
Posted by Philippe Mathieu-Daudé 2 years, 2 months ago
On 7/3/22 20:15, Daniel Henrique Barboza wrote:
> Hi,
> 
> I got a lot of noise trying to debug an AIX guest in a pseries machine when running with
> '-d unimp'. The reason is that there is no distinction between features
> (in my case, hypercalls) that are unimplemented because we never considered,
> versus features that we made a design choice not to implement.
> 
> This series adds a new log type, LOG_UNSUPP, as a way to filter the
> second case. After changing the log level of existing unsupported
> pseries hypercalls, -d unimp was reporting just the ones that I need to
> worry about and decide whether we should implement it or mark as
> unsupported in our model. After this series there's still one hypercall
> thgat is being thrown by AIX. We'll deal with that another day.

Why not simply use a trace_spapr_hcall_unsupported() event?

Then you can filter using '-trace spapr_hcall_unsupported'.