[RFC PATCH 0/5] target/ppc: initial SMT support in TCG

Nicholas Piggin posted 5 patches 10 months, 3 weeks ago
Failed in applying to current master (apply log)
hw/ppc/ppc.c                                  |  6 ++
hw/ppc/spapr.c                                |  4 +-
hw/ppc/spapr_cpu_core.c                       |  7 +-
include/hw/ppc/ppc.h                          |  1 +
target/ppc/cpu.h                              | 16 +++-
target/ppc/cpu_init.c                         |  5 +
target/ppc/excp_helper.c                      | 86 ++++++++++++-----
target/ppc/gdbstub.c                          | 32 ++++---
target/ppc/helper.h                           |  4 +-
target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
target/ppc/translate.c                        | 46 ++++++++-
.../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
12 files changed, 252 insertions(+), 50 deletions(-)
[RFC PATCH 0/5] target/ppc: initial SMT support in TCG
Posted by Nicholas Piggin 10 months, 3 weeks ago
Hi,

I'm posting this now just to get some first thoughts. I wouldn't say
it's ready but it does actually work with some basic tests including
pseries booting a Linux distro. I have powernv booting too, it just
requires some more SPRs converted, nothing fundamentally different so
for the purpose of this RFC I leave it out.

A couple of things, I don't know the object model well enough to do
something nice with topology. Iterating siblings I would have thought
should be going to parent core then iterating its children CPUs. Should
that be done with the object model, or is it better to add direct
pointers in CPUs to core and core to CPUs? It is (semi) important for
performance so maybe that is better than object iterators. If we go that
way, the PnvCore and SpaprCore have pointers to the SMT threads already,
should those be abstracted go in the CPUCore?

The other thing is the serialisation of access. It's using the atomic
single stepping for this which... I guess should be sufficient? Is it
the best way to do it though? Can a lock be used somehow instead?

Thanks,
Nick

Nicholas Piggin (5):
  target/ppc: gdbstub init spr gdb_id for all CPUs
  target/ppc: Add initial flags and helpers for SMT support
  target/ppc: Add support for SMT CTRL register
  target/ppc: Add msgsnd/p and DPDES SMT support
  spapr: Allow up to 8 threads SMT configuration

 hw/ppc/ppc.c                                  |  6 ++
 hw/ppc/spapr.c                                |  4 +-
 hw/ppc/spapr_cpu_core.c                       |  7 +-
 include/hw/ppc/ppc.h                          |  1 +
 target/ppc/cpu.h                              | 16 +++-
 target/ppc/cpu_init.c                         |  5 +
 target/ppc/excp_helper.c                      | 86 ++++++++++++-----
 target/ppc/gdbstub.c                          | 32 ++++---
 target/ppc/helper.h                           |  4 +-
 target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
 target/ppc/translate.c                        | 46 ++++++++-
 .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
 12 files changed, 252 insertions(+), 50 deletions(-)

-- 
2.40.1
Re: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
Posted by Cédric Le Goater 10 months, 3 weeks ago
Hello Nick,

On 5/31/23 03:23, Nicholas Piggin wrote:
> Hi,
> 
> I'm posting this now just to get some first thoughts. I wouldn't say
> it's ready but it does actually work with some basic tests including
> pseries booting a Linux distro. I have powernv booting too, it just
> requires some more SPRs converted, nothing fundamentally different so
> for the purpose of this RFC I leave it out.
> 
> A couple of things, I don't know the object model well enough to do
> something nice with topology. Iterating siblings I would have thought
> should be going to parent core then iterating its children CPUs. Should
> that be done with the object model, or is it better to add direct
> pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
> way, the PnvCore and SpaprCore have pointers to the SMT threads already,
> should those be abstracted go in the CPUCore?

You should be able to move the thread array into the CPUCore. If you do
that, please check that migration compat is not impacted by the state
change. However, I am not sure you can use the CPUCore model under the
insn modeling. Something to check.

Anyhow, the way you implemented the loop on the siblings is sufficiently
fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
I would leave that part for now, if it runs decently with 4*4 vCPUs in
TCG it should be fine.

Thanks,

C.


  
> The other thing is the serialisation of access. It's using the atomic
> single stepping for this which... I guess should be sufficient? Is it
> the best way to do it though? Can a lock be used somehow instead?
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (5):
>    target/ppc: gdbstub init spr gdb_id for all CPUs
>    target/ppc: Add initial flags and helpers for SMT support
>    target/ppc: Add support for SMT CTRL register
>    target/ppc: Add msgsnd/p and DPDES SMT support
>    spapr: Allow up to 8 threads SMT configuration
> 
>   hw/ppc/ppc.c                                  |  6 ++
>   hw/ppc/spapr.c                                |  4 +-
>   hw/ppc/spapr_cpu_core.c                       |  7 +-
>   include/hw/ppc/ppc.h                          |  1 +
>   target/ppc/cpu.h                              | 16 +++-
>   target/ppc/cpu_init.c                         |  5 +
>   target/ppc/excp_helper.c                      | 86 ++++++++++++-----
>   target/ppc/gdbstub.c                          | 32 ++++---
>   target/ppc/helper.h                           |  4 +-
>   target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
>   target/ppc/translate.c                        | 46 ++++++++-
>   .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
>   12 files changed, 252 insertions(+), 50 deletions(-)
>
Re: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
Posted by Nicholas Piggin 10 months, 3 weeks ago
On Thu Jun 1, 2023 at 5:56 PM AEST, Cédric Le Goater wrote:
> Hello Nick,
>
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > Hi,
> > 
> > I'm posting this now just to get some first thoughts. I wouldn't say
> > it's ready but it does actually work with some basic tests including
> > pseries booting a Linux distro. I have powernv booting too, it just
> > requires some more SPRs converted, nothing fundamentally different so
> > for the purpose of this RFC I leave it out.
> > 
> > A couple of things, I don't know the object model well enough to do
> > something nice with topology. Iterating siblings I would have thought
> > should be going to parent core then iterating its children CPUs. Should
> > that be done with the object model, or is it better to add direct
> > pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
> > way, the PnvCore and SpaprCore have pointers to the SMT threads already,
> > should those be abstracted go in the CPUCore?
>
> You should be able to move the thread array into the CPUCore. If you do
> that, please check that migration compat is not impacted by the state
> change. However, I am not sure you can use the CPUCore model under the
> insn modeling. Something to check.

Okay.

> Anyhow, the way you implemented the loop on the siblings is sufficiently
> fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
> I would leave that part for now, if it runs decently with 4*4 vCPUs in
> TCG it should be fine.

Yeah you're right I'm overly paranoid about it but we don't do hundreds
of CPUs in TCG so it should be fine. Maybe I will defer it for now
then and just do the CPU iteration.

Thanks,
Nick
Re: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
Posted by Cédric Le Goater 10 months, 3 weeks ago
On 6/2/23 09:01, Nicholas Piggin wrote:
> On Thu Jun 1, 2023 at 5:56 PM AEST, Cédric Le Goater wrote:
>> Hello Nick,
>>
>> On 5/31/23 03:23, Nicholas Piggin wrote:
>>> Hi,
>>>
>>> I'm posting this now just to get some first thoughts. I wouldn't say
>>> it's ready but it does actually work with some basic tests including
>>> pseries booting a Linux distro. I have powernv booting too, it just
>>> requires some more SPRs converted, nothing fundamentally different so
>>> for the purpose of this RFC I leave it out.
>>>
>>> A couple of things, I don't know the object model well enough to do
>>> something nice with topology. Iterating siblings I would have thought
>>> should be going to parent core then iterating its children CPUs. Should
>>> that be done with the object model, or is it better to add direct
>>> pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
>>> way, the PnvCore and SpaprCore have pointers to the SMT threads already,
>>> should those be abstracted go in the CPUCore?
>>
>> You should be able to move the thread array into the CPUCore. If you do
>> that, please check that migration compat is not impacted by the state
>> change. However, I am not sure you can use the CPUCore model under the
>> insn modeling. Something to check.
> 
> Okay.
> 
>> Anyhow, the way you implemented the loop on the siblings is sufficiently
>> fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
>> I would leave that part for now, if it runs decently with 4*4 vCPUs in
>> TCG it should be fine.
> 
> Yeah you're right I'm overly paranoid about it but we don't do hundreds
> of CPUs in TCG so it should be fine. 

The PowerNV did run with 64 CPUs at some point. Boot was slow bc of
contention in some areas when starting the secondaries. When stabilized,
perf was decent.

I think that a realistic goal for book3s is to support 2 sockets * 2 cores
* 4 threads on PowerNV and 16 vCPUs on pseries, this to exercise the various
ways to IPI on the different HV implementations.

> Maybe I will defer it for now then and just do the CPU iteration.

May be there is some value to store a CPU siblings under the PowerPC
CPU descriptor. IT could be useful for instructions that apply to the
current CPU but for others requiring a PIR value, you will have to
find a starting point in the CPU list, so it won't make much difference
I think.

Thanks,

C.



> 
> Thanks,
> Nick