[Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP

Igor Mammedov posted 6 patches 6 years, 6 months ago
Failed in applying to current master (apply log)
hmp.h                      |  1 +
include/hw/boards.h        |  2 ++
include/sysemu/numa.h      |  2 ++
include/sysemu/sysemu.h    |  1 +
hmp-commands.hx            | 13 ++++++++
hmp.c                      | 23 ++++++++++++++
hw/arm/virt.c              |  3 +-
hw/core/machine.c          | 18 ++++++-----
hw/i386/pc.c               |  4 ++-
hw/ppc/spapr.c             | 13 +++++---
hw/s390x/s390-virtio-ccw.c |  1 +
numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
qapi-schema.json           | 13 ++++++++
qemu-options.hx            | 15 +++++++++
qmp.c                      |  5 +++
vl.c                       | 54 ++++++++++++++++++++++++++++++-
16 files changed, 210 insertions(+), 37 deletions(-)
[Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
Series allows to configure NUMA mapping at runtime using QMP/HMP
interface. For that to happen it introduces a new '-paused' CLI option
which allows to pause QEMU before machine_init() is run and
adds new set-numa-node HMP/QMP commands which in conjuction with
info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
NUMA mapping for cpus.

HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:

(qemu) info hotpluggable-cpus 
Hotpluggable CPUs:
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  CPUInstance Properties:
    socket-id: "1"
    core-id: "0"
    thread-id: "0"
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  qom_path: "/machine/unattached/device[0]"
  CPUInstance Properties:
    socket-id: "0"
    core-id: "0"
    thread-id: "0"
(qemu) set-numa-node node,nodeid=0
(qemu) set-numa-node node,nodeid=1
(qemu) set-numa-node cpu,socket-id=0,node-id=0
(qemu) set-numa-node cpu,socket-id=1,node-id=1
(qemu) info hotpluggable-cpus 
Hotpluggable CPUs:
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  CPUInstance Properties:
    node-id: "1"
    socket-id: "1"
    core-id: "0"
    thread-id: "0"
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  CPUInstance Properties:
    node-id: "0"
    socket-id: "0"
    core-id: "0"
    thread-id: "0"
(qemu) cont

git tree for testing:
  https://github.com/imammedo/qemu qmp_preconfig_rfc


CC: eblake@redhat.com
CC: armbru@redhat.com
CC: ehabkost@redhat.com
CC: pkrempa@redhat.com
CC: david@gibson.dropbear.id.au
CC: peter.maydell@linaro.org
CC: pbonzini@redhat.com
CC: cohuck@redhat.com

Igor Mammedov (6):
  numa: postpone options post-processing till machine_run_board_init()
  numa: split out NumaOptions parsing into parse_NumaOptions()
  possible_cpus: add CPUArchId::type field
  CLI: add -paused option
  HMP: add set-numa-node command
  QMP: add set-numa-node command

 hmp.h                      |  1 +
 include/hw/boards.h        |  2 ++
 include/sysemu/numa.h      |  2 ++
 include/sysemu/sysemu.h    |  1 +
 hmp-commands.hx            | 13 ++++++++
 hmp.c                      | 23 ++++++++++++++
 hw/arm/virt.c              |  3 +-
 hw/core/machine.c          | 18 ++++++-----
 hw/i386/pc.c               |  4 ++-
 hw/ppc/spapr.c             | 13 +++++---
 hw/s390x/s390-virtio-ccw.c |  1 +
 numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
 qapi-schema.json           | 13 ++++++++
 qemu-options.hx            | 15 +++++++++
 qmp.c                      |  5 +++
 vl.c                       | 54 ++++++++++++++++++++++++++++++-
 16 files changed, 210 insertions(+), 37 deletions(-)

-- 
2.7.4


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> Series allows to configure NUMA mapping at runtime using QMP/HMP
> interface. For that to happen it introduces a new '-paused' CLI option
> which allows to pause QEMU before machine_init() is run and
> adds new set-numa-node HMP/QMP commands which in conjuction with
> info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> NUMA mapping for cpus.

What's the problem we're seeking solve here compared to what we currently
do for NUMA configuration ?

> 
> HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:
> 
> (qemu) info hotpluggable-cpus 
> Hotpluggable CPUs:
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   CPUInstance Properties:
>     socket-id: "1"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/unattached/device[0]"
>   CPUInstance Properties:
>     socket-id: "0"
>     core-id: "0"
>     thread-id: "0"
> (qemu) set-numa-node node,nodeid=0
> (qemu) set-numa-node node,nodeid=1
> (qemu) set-numa-node cpu,socket-id=0,node-id=0
> (qemu) set-numa-node cpu,socket-id=1,node-id=1
> (qemu) info hotpluggable-cpus 
> Hotpluggable CPUs:
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   CPUInstance Properties:
>     node-id: "1"
>     socket-id: "1"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   CPUInstance Properties:
>     node-id: "0"
>     socket-id: "0"
>     core-id: "0"
>     thread-id: "0"
> (qemu) cont
> 
> git tree for testing:
>   https://github.com/imammedo/qemu qmp_preconfig_rfc
> 
> 
> CC: eblake@redhat.com
> CC: armbru@redhat.com
> CC: ehabkost@redhat.com
> CC: pkrempa@redhat.com
> CC: david@gibson.dropbear.id.au
> CC: peter.maydell@linaro.org
> CC: pbonzini@redhat.com
> CC: cohuck@redhat.com
> 
> Igor Mammedov (6):
>   numa: postpone options post-processing till machine_run_board_init()
>   numa: split out NumaOptions parsing into parse_NumaOptions()
>   possible_cpus: add CPUArchId::type field
>   CLI: add -paused option
>   HMP: add set-numa-node command
>   QMP: add set-numa-node command
> 
>  hmp.h                      |  1 +
>  include/hw/boards.h        |  2 ++
>  include/sysemu/numa.h      |  2 ++
>  include/sysemu/sysemu.h    |  1 +
>  hmp-commands.hx            | 13 ++++++++
>  hmp.c                      | 23 ++++++++++++++
>  hw/arm/virt.c              |  3 +-
>  hw/core/machine.c          | 18 ++++++-----
>  hw/i386/pc.c               |  4 ++-
>  hw/ppc/spapr.c             | 13 +++++---
>  hw/s390x/s390-virtio-ccw.c |  1 +
>  numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
>  qapi-schema.json           | 13 ++++++++
>  qemu-options.hx            | 15 +++++++++
>  qmp.c                      |  5 +++
>  vl.c                       | 54 ++++++++++++++++++++++++++++++-
>  16 files changed, 210 insertions(+), 37 deletions(-)
> 
> -- 
> 2.7.4
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Mon, Oct 16, 2017 at 05:36:36PM +0100, Daniel P. Berrange wrote:
> On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > interface. For that to happen it introduces a new '-paused' CLI option
> > which allows to pause QEMU before machine_init() is run and
> > adds new set-numa-node HMP/QMP commands which in conjuction with
> > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > NUMA mapping for cpus.
> 
> What's the problem we're seeking solve here compared to what we currently
> do for NUMA configuration ?

I don't understand completely what exactly Igor is trying to
solve, but this new mode would be very helpful to address the
issues mentioned at:

http://www.linux-kvm.org/images/4/46/03x06A-Eduardo_HabkostMachine-type_Introspection_and_Configuration_Where_Are_We_Going.pdf
(starting on slide 12)

> 
> > 
> > HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:
> > 
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   qom_path: "/machine/unattached/device[0]"
> >   CPUInstance Properties:
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) set-numa-node node,nodeid=0
> > (qemu) set-numa-node node,nodeid=1
> > (qemu) set-numa-node cpu,socket-id=0,node-id=0
> > (qemu) set-numa-node cpu,socket-id=1,node-id=1
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "1"
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "0"
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) cont
> > 
> > git tree for testing:
> >   https://github.com/imammedo/qemu qmp_preconfig_rfc
> > 
> > 
> > CC: eblake@redhat.com
> > CC: armbru@redhat.com
> > CC: ehabkost@redhat.com
> > CC: pkrempa@redhat.com
> > CC: david@gibson.dropbear.id.au
> > CC: peter.maydell@linaro.org
> > CC: pbonzini@redhat.com
> > CC: cohuck@redhat.com
> > 
> > Igor Mammedov (6):
> >   numa: postpone options post-processing till machine_run_board_init()
> >   numa: split out NumaOptions parsing into parse_NumaOptions()
> >   possible_cpus: add CPUArchId::type field
> >   CLI: add -paused option
> >   HMP: add set-numa-node command
> >   QMP: add set-numa-node command
> > 
> >  hmp.h                      |  1 +
> >  include/hw/boards.h        |  2 ++
> >  include/sysemu/numa.h      |  2 ++
> >  include/sysemu/sysemu.h    |  1 +
> >  hmp-commands.hx            | 13 ++++++++
> >  hmp.c                      | 23 ++++++++++++++
> >  hw/arm/virt.c              |  3 +-
> >  hw/core/machine.c          | 18 ++++++-----
> >  hw/i386/pc.c               |  4 ++-
> >  hw/ppc/spapr.c             | 13 +++++---
> >  hw/s390x/s390-virtio-ccw.c |  1 +
> >  numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
> >  qapi-schema.json           | 13 ++++++++
> >  qemu-options.hx            | 15 +++++++++
> >  qmp.c                      |  5 +++
> >  vl.c                       | 54 ++++++++++++++++++++++++++++++-
> >  16 files changed, 210 insertions(+), 37 deletions(-)
> > 
> > -- 
> > 2.7.4
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Mon, 16 Oct 2017 17:36:36 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > interface. For that to happen it introduces a new '-paused' CLI option
> > which allows to pause QEMU before machine_init() is run and
> > adds new set-numa-node HMP/QMP commands which in conjuction with
> > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > NUMA mapping for cpus.  
> 
> What's the problem we're seeking solve here compared to what we currently
> do for NUMA configuration ?
From RHBZ1382425
"
Current -numa CLI interface is quite limited in terms that allow map CPUs to NUMA nodes as it requires to provide cpu_index values which are non obvious and depend on machine/arch. As result libvirt has to assume/re-implement cpu_index allocation logic to provide valid values for -numa cpus=... QEMU CLI option.

Now QEMU has in place generic CPU hotplug interface and ability to query possible CPUs layout (with QMP command query-hotpluggable-cpus),
however it requires to run QEMU once per each machine type and topology configuration (-M & -smp combination) which would be too taxing for mgmt layer to do.
Currently proposed idea to solve the issue is to do NUMA mapping at runtime:
 1. start QEMU in stopped mode with needed -M & -smp configuration
    but leave out "-numa cpus" options
 2. query possible cpus layout (query-hotpluggable-cpus)
 3. use new QMP command to map CPUs to NUMA node in terms of generic CPU
    hotplug interface (socket/core/thread)

    commit (419fcde numa: add '-numa cpu,...' option for property based node mapping)
    added CLI option for topology based

...
 4. continue VM exection
"

> > 
> > HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:
> > 
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   qom_path: "/machine/unattached/device[0]"
> >   CPUInstance Properties:
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) set-numa-node node,nodeid=0
> > (qemu) set-numa-node node,nodeid=1
> > (qemu) set-numa-node cpu,socket-id=0,node-id=0
> > (qemu) set-numa-node cpu,socket-id=1,node-id=1
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "1"
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "0"
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) cont
> > 
> > git tree for testing:
> >   https://github.com/imammedo/qemu qmp_preconfig_rfc
> > 
> > 
> > CC: eblake@redhat.com
> > CC: armbru@redhat.com
> > CC: ehabkost@redhat.com
> > CC: pkrempa@redhat.com
> > CC: david@gibson.dropbear.id.au
> > CC: peter.maydell@linaro.org
> > CC: pbonzini@redhat.com
> > CC: cohuck@redhat.com
> > 
> > Igor Mammedov (6):
> >   numa: postpone options post-processing till machine_run_board_init()
> >   numa: split out NumaOptions parsing into parse_NumaOptions()
> >   possible_cpus: add CPUArchId::type field
> >   CLI: add -paused option
> >   HMP: add set-numa-node command
> >   QMP: add set-numa-node command
> > 
> >  hmp.h                      |  1 +
> >  include/hw/boards.h        |  2 ++
> >  include/sysemu/numa.h      |  2 ++
> >  include/sysemu/sysemu.h    |  1 +
> >  hmp-commands.hx            | 13 ++++++++
> >  hmp.c                      | 23 ++++++++++++++
> >  hw/arm/virt.c              |  3 +-
> >  hw/core/machine.c          | 18 ++++++-----
> >  hw/i386/pc.c               |  4 ++-
> >  hw/ppc/spapr.c             | 13 +++++---
> >  hw/s390x/s390-virtio-ccw.c |  1 +
> >  numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
> >  qapi-schema.json           | 13 ++++++++
> >  qemu-options.hx            | 15 +++++++++
> >  qmp.c                      |  5 +++
> >  vl.c                       | 54 ++++++++++++++++++++++++++++++-
> >  16 files changed, 210 insertions(+), 37 deletions(-)
> > 
> > -- 
> > 2.7.4
> > 
> >   
> 
> Regards,
> Daniel


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> On Mon, 16 Oct 2017 17:36:36 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > interface. For that to happen it introduces a new '-paused' CLI option
> > > which allows to pause QEMU before machine_init() is run and
> > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > NUMA mapping for cpus.  
> > 
> > What's the problem we're seeking solve here compared to what we currently
> > do for NUMA configuration ?
> From RHBZ1382425
> "
> Current -numa CLI interface is quite limited in terms that allow map
> CPUs to NUMA nodes as it requires to provide cpu_index values which 
> are non obvious and depend on machine/arch. As result libvirt has to
> assume/re-implement cpu_index allocation logic to provide valid 
> values for -numa cpus=... QEMU CLI option.

In broad terms, this problem applies to every device / object libvirt
asks QEMU to create. For everything else libvirt is able to assign a
"id" string, which is can then use to identify the thing later. The
CPU stuff is different because libvirt isn't able to provide 'id'
strings for each CPU - QEMU generates a psuedo-id internally which
libvirt has to infer. The latter is the same problem we had with
devices before '-device' was introduced allowing 'id' naming.

IMHO we should take the same approach with CPUs and start modelling 
the individual CPUs as something we can explicitly create with -object
or -device. That way libvirt can assign names and does not have to 
care about CPU index values, and it all works just the same way as
any other devices / object we create

ie instead of:

  -smp 8,sockets=4,cores=2,threads=1
  -numa node,nodeid=0,cpus=0-3
  -numa node,nodeid=1,cpus=4-7

we could do:

  -object numa-node,id=numa0
  -object numa-node,id=numa1
  -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
  -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
  -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
  -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
  -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
  -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
  -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
  -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0

(perhaps -device instead of -object above, but that's a minor detail)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Laszlo Ersek 6 years, 6 months ago
On 10/17/17 17:07, Daniel P. Berrange wrote:
> On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
>> On Mon, 16 Oct 2017 17:36:36 +0100
>> "Daniel P. Berrange" <berrange@redhat.com> wrote:
>>
>>> On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
>>>> Series allows to configure NUMA mapping at runtime using QMP/HMP
>>>> interface. For that to happen it introduces a new '-paused' CLI option
>>>> which allows to pause QEMU before machine_init() is run and
>>>> adds new set-numa-node HMP/QMP commands which in conjuction with
>>>> info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
>>>> NUMA mapping for cpus.  
>>>
>>> What's the problem we're seeking solve here compared to what we currently
>>> do for NUMA configuration ?
>> From RHBZ1382425
>> "
>> Current -numa CLI interface is quite limited in terms that allow map
>> CPUs to NUMA nodes as it requires to provide cpu_index values which 
>> are non obvious and depend on machine/arch. As result libvirt has to
>> assume/re-implement cpu_index allocation logic to provide valid 
>> values for -numa cpus=... QEMU CLI option.
> 
> In broad terms, this problem applies to every device / object libvirt
> asks QEMU to create. For everything else libvirt is able to assign a
> "id" string, which is can then use to identify the thing later. The
> CPU stuff is different because libvirt isn't able to provide 'id'
> strings for each CPU - QEMU generates a psuedo-id internally which
> libvirt has to infer.

Oh. This is the critical bit I've been missing.

Sorry about the noise I've made!

Thanks!
Laszlo


> The latter is the same problem we had with
> devices before '-device' was introduced allowing 'id' naming.
> 
> IMHO we should take the same approach with CPUs and start modelling 
> the individual CPUs as something we can explicitly create with -object
> or -device. That way libvirt can assign names and does not have to 
> care about CPU index values, and it all works just the same way as
> any other devices / object we create
> 
> ie instead of:
> 
>   -smp 8,sockets=4,cores=2,threads=1
>   -numa node,nodeid=0,cpus=0-3
>   -numa node,nodeid=1,cpus=4-7
> 
> we could do:
> 
>   -object numa-node,id=numa0
>   -object numa-node,id=numa1
>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> 
> (perhaps -device instead of -object above, but that's a minor detail)
> 
> Regards,
> Daniel
> 


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Tue, 17 Oct 2017 16:07:59 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > On Mon, 16 Oct 2017 17:36:36 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > which allows to pause QEMU before machine_init() is run and
> > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > NUMA mapping for cpus.    
> > > 
> > > What's the problem we're seeking solve here compared to what we currently
> > > do for NUMA configuration ?  
> > From RHBZ1382425
> > "
> > Current -numa CLI interface is quite limited in terms that allow map
> > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > are non obvious and depend on machine/arch. As result libvirt has to
> > assume/re-implement cpu_index allocation logic to provide valid 
> > values for -numa cpus=... QEMU CLI option.  
> 
> In broad terms, this problem applies to every device / object libvirt
> asks QEMU to create. For everything else libvirt is able to assign a
> "id" string, which is can then use to identify the thing later. The
> CPU stuff is different because libvirt isn't able to provide 'id'
> strings for each CPU - QEMU generates a psuedo-id internally which
> libvirt has to infer. The latter is the same problem we had with
> devices before '-device' was introduced allowing 'id' naming.
> 
> IMHO we should take the same approach with CPUs and start modelling 
> the individual CPUs as something we can explicitly create with -object
> or -device. That way libvirt can assign names and does not have to 
> care about CPU index values, and it all works just the same way as
> any other devices / object we create
> 
> ie instead of:
> 
>   -smp 8,sockets=4,cores=2,threads=1
>   -numa node,nodeid=0,cpus=0-3
>   -numa node,nodeid=1,cpus=4-7
> 
> we could do:
> 
>   -object numa-node,id=numa0
>   -object numa-node,id=numa1
>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
the follow up question would be where do "socket=3,core=1,thread=0"
come from, currently these options are the function of
(-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
runtime after qemu parses -M and -smp options.

It's either mgtg asks qemu for values or it duplicates each board
logic (including compat hacks per machine version) to be able
generate values/properties on it's own.


> (perhaps -device instead of -object above, but that's a minor detail)
> 
> Regards,
> Daniel


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 16:07:59 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > which allows to pause QEMU before machine_init() is run and
> > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > NUMA mapping for cpus.    
> > > > 
> > > > What's the problem we're seeking solve here compared to what we currently
> > > > do for NUMA configuration ?  
> > > From RHBZ1382425
> > > "
> > > Current -numa CLI interface is quite limited in terms that allow map
> > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > are non obvious and depend on machine/arch. As result libvirt has to
> > > assume/re-implement cpu_index allocation logic to provide valid 
> > > values for -numa cpus=... QEMU CLI option.  
> > 
> > In broad terms, this problem applies to every device / object libvirt
> > asks QEMU to create. For everything else libvirt is able to assign a
> > "id" string, which is can then use to identify the thing later. The
> > CPU stuff is different because libvirt isn't able to provide 'id'
> > strings for each CPU - QEMU generates a psuedo-id internally which
> > libvirt has to infer. The latter is the same problem we had with
> > devices before '-device' was introduced allowing 'id' naming.
> > 
> > IMHO we should take the same approach with CPUs and start modelling 
> > the individual CPUs as something we can explicitly create with -object
> > or -device. That way libvirt can assign names and does not have to 
> > care about CPU index values, and it all works just the same way as
> > any other devices / object we create
> > 
> > ie instead of:
> > 
> >   -smp 8,sockets=4,cores=2,threads=1
> >   -numa node,nodeid=0,cpus=0-3
> >   -numa node,nodeid=1,cpus=4-7
> > 
> > we could do:
> > 
> >   -object numa-node,id=numa0
> >   -object numa-node,id=numa1
> >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> the follow up question would be where do "socket=3,core=1,thread=0"
> come from, currently these options are the function of
> (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> runtime after qemu parses -M and -smp options.

The sockets/cores/threads topology of CPUs is something that comes from
the libvirt guest XML config

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Tue, 17 Oct 2017 17:09:26 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 16:07:59 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >     
> > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:    
> > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > NUMA mapping for cpus.      
> > > > > 
> > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > do for NUMA configuration ?    
> > > > From RHBZ1382425
> > > > "
> > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > values for -numa cpus=... QEMU CLI option.    
> > > 
> > > In broad terms, this problem applies to every device / object libvirt
> > > asks QEMU to create. For everything else libvirt is able to assign a
> > > "id" string, which is can then use to identify the thing later. The
> > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > libvirt has to infer. The latter is the same problem we had with
> > > devices before '-device' was introduced allowing 'id' naming.
> > > 
> > > IMHO we should take the same approach with CPUs and start modelling 
> > > the individual CPUs as something we can explicitly create with -object
> > > or -device. That way libvirt can assign names and does not have to 
> > > care about CPU index values, and it all works just the same way as
> > > any other devices / object we create
> > > 
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > the follow up question would be where do "socket=3,core=1,thread=0"
> > come from, currently these options are the function of
> > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > runtime after qemu parses -M and -smp options.  
> 
> The sockets/cores/threads topology of CPUs is something that comes from
> the libvirt guest XML config
in this case things for libvirt to implement would be to know following details:
   1: which machine/machine version support which set of attributes
   2: valid values for these properties depending on machine/machine version/cpu type


> Regards,
> Daniel


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 17:09:26 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >     
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:    
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.      
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > do for NUMA configuration ?    
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > values for -numa cpus=... QEMU CLI option.    
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to 
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.  
> > 

Also, note that in the case of NUMA, having identifiers for CPU
objects themselves won't be enough. NUMA settings need
identifiers for CPU slots (even if they are still empty), and
those slots are provided by the machine, not created by the user.


> > The sockets/cores/threads topology of CPUs is something that comes from
> > the libvirt guest XML config
> in this case things for libvirt to implement would be to know following details:
>    1: which machine/machine version support which set of attributes
>    2: valid values for these properties depending on machine/machine version/cpu type

The big assumption in this series is that libvirt doesn't know in
advance how the possible slots for CPUs will look like on each
machine-type, and need to query them using
query-hotpluggable-cpus.

But if this assumption was really true, it would be impossible
for the user to even decide how the NUMA topology will look like,
wouldn't it?

Igor, are you able to give one example of how the user input
(libvirt XML) for configuring NUMA CPU binding could look like if
the user didn't know yet what the available sockets/cores/threads
are?

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Wed, 18 Oct 2017 10:59:11 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 17:09:26 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >     
> > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:    
> > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >       
> > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:      
> > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > NUMA mapping for cpus.        
> > > > > > > 
> > > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > > do for NUMA configuration ?      
> > > > > > From RHBZ1382425
> > > > > > "
> > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > > values for -numa cpus=... QEMU CLI option.      
> > > > > 
> > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > "id" string, which is can then use to identify the thing later. The
> > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > 
> > > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > > the individual CPUs as something we can explicitly create with -object
> > > > > or -device. That way libvirt can assign names and does not have to 
> > > > > care about CPU index values, and it all works just the same way as
> > > > > any other devices / object we create
> > > > > 
> > > > > ie instead of:
> > > > > 
> > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > >   -numa node,nodeid=0,cpus=0-3
> > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > 
> > > > > we could do:
> > > > > 
> > > > >   -object numa-node,id=numa0
> > > > >   -object numa-node,id=numa1
> > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0    
> > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > come from, currently these options are the function of
> > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > runtime after qemu parses -M and -smp options.    
> > >   
> 
> Also, note that in the case of NUMA, having identifiers for CPU
> objects themselves won't be enough. NUMA settings need
> identifiers for CPU slots (even if they are still empty), and
> those slots are provided by the machine, not created by the user.
> 
> 
> > > The sockets/cores/threads topology of CPUs is something that comes from
> > > the libvirt guest XML config  
> > in this case things for libvirt to implement would be to know following details:
> >    1: which machine/machine version support which set of attributes
> >    2: valid values for these properties depending on machine/machine version/cpu type  
> 
> The big assumption in this series is that libvirt doesn't know in
> advance how the possible slots for CPUs will look like on each
> machine-type, and need to query them using
> query-hotpluggable-cpus.
yep, that's true and it started with introduction of 'device_add cpu'
where libvirt didn't new what to specify as options for new cpu,
hence query-hotpluggable-cpus were added to provide that information.


> But if this assumption was really true, it would be impossible
> for the user to even decide how the NUMA topology will look like,
> wouldn't it?
> 
> Igor, are you able to give one example of how the user input
> (libvirt XML) for configuring NUMA CPU binding could look like if
> the user didn't know yet what the available sockets/cores/threads
> are?
not sure I parse question but looking at libvirt's domain docs
it mentions
  <numa>
    <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
    <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
  </numa>

here libvirt assumes that there are cpus with cpu-index in range 0-7
/and probably duplicates logic that calculates cpu-index/
If libvirt would continue to duplicate logic we could skip on
implementing early runtime QMP in QEMU and also drop support for
query-hotpluggable-cpus as libvirt would be able to compute
properties/values on it's own.

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> On Wed, 18 Oct 2017 10:59:11 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 17:09:26 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >     
> > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:    
> > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >       
> > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:      
> > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > NUMA mapping for cpus.        
> > > > > > > > 
> > > > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > > > do for NUMA configuration ?      
> > > > > > > From RHBZ1382425
> > > > > > > "
> > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > > > values for -numa cpus=... QEMU CLI option.      
> > > > > > 
> > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > 
> > > > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > or -device. That way libvirt can assign names and does not have to 
> > > > > > care about CPU index values, and it all works just the same way as
> > > > > > any other devices / object we create
> > > > > > 
> > > > > > ie instead of:
> > > > > > 
> > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > 
> > > > > > we could do:
> > > > > > 
> > > > > >   -object numa-node,id=numa0
> > > > > >   -object numa-node,id=numa1
> > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0    
> > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > come from, currently these options are the function of
> > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > runtime after qemu parses -M and -smp options.    
> > > >   
> > 
> > Also, note that in the case of NUMA, having identifiers for CPU
> > objects themselves won't be enough. NUMA settings need
> > identifiers for CPU slots (even if they are still empty), and
> > those slots are provided by the machine, not created by the user.
> > 
> > 
> > > > The sockets/cores/threads topology of CPUs is something that comes from
> > > > the libvirt guest XML config  
> > > in this case things for libvirt to implement would be to know following details:
> > >    1: which machine/machine version support which set of attributes
> > >    2: valid values for these properties depending on machine/machine version/cpu type  
> > 
> > The big assumption in this series is that libvirt doesn't know in
> > advance how the possible slots for CPUs will look like on each
> > machine-type, and need to query them using
> > query-hotpluggable-cpus.
> yep, that's true and it started with introduction of 'device_add cpu'
> where libvirt didn't new what to specify as options for new cpu,
> hence query-hotpluggable-cpus were added to provide that information.
> 
> 
> > But if this assumption was really true, it would be impossible
> > for the user to even decide how the NUMA topology will look like,
> > wouldn't it?
> > 
> > Igor, are you able to give one example of how the user input
> > (libvirt XML) for configuring NUMA CPU binding could look like if
> > the user didn't know yet what the available sockets/cores/threads
> > are?
> not sure I parse question but looking at libvirt's domain docs
> it mentions
>   <numa>
>     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
>     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
>   </numa>
> 
> here libvirt assumes that there are cpus with cpu-index in range 0-7
> /and probably duplicates logic that calculates cpu-index/
> If libvirt would continue to duplicate logic we could skip on
> implementing early runtime QMP in QEMU and also drop support for
> query-hotpluggable-cpus as libvirt would be able to compute
> properties/values on it's own.

From the POV of the XML, these CPU numbers are *not* required to be
the same as any QEMU CPU index. This is just saying that we've got
a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
and the second 4 in the second node. 

If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
the XML POV, which uses 0-7 regardless. If there ever was such a
disjoint representation of CPU indexes libvirt would have to remap
whats in the XML to match whats in QEMU

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Wed, 18 Oct 2017 15:49:36 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> > On Wed, 18 Oct 2017 10:59:11 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:  
> > > > On Tue, 17 Oct 2017 17:09:26 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >     
> > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:    
> > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >       
> > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:      
> > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > >         
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:        
> > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > NUMA mapping for cpus.          
> > > > > > > > > 
> > > > > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > > > > do for NUMA configuration ?        
> > > > > > > > From RHBZ1382425
> > > > > > > > "
> > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > > > > values for -numa cpus=... QEMU CLI option.        
> > > > > > > 
> > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > 
> > > > > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > or -device. That way libvirt can assign names and does not have to 
> > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > any other devices / object we create
> > > > > > > 
> > > > > > > ie instead of:
> > > > > > > 
> > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > 
> > > > > > > we could do:
> > > > > > > 
> > > > > > >   -object numa-node,id=numa0
> > > > > > >   -object numa-node,id=numa1
> > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0      
> > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > come from, currently these options are the function of
> > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > runtime after qemu parses -M and -smp options.      
> > > > >     
> > > 
> > > Also, note that in the case of NUMA, having identifiers for CPU
> > > objects themselves won't be enough. NUMA settings need
> > > identifiers for CPU slots (even if they are still empty), and
> > > those slots are provided by the machine, not created by the user.
> > > 
> > >   
> > > > > The sockets/cores/threads topology of CPUs is something that comes from
> > > > > the libvirt guest XML config    
> > > > in this case things for libvirt to implement would be to know following details:
> > > >    1: which machine/machine version support which set of attributes
> > > >    2: valid values for these properties depending on machine/machine version/cpu type    
> > > 
> > > The big assumption in this series is that libvirt doesn't know in
> > > advance how the possible slots for CPUs will look like on each
> > > machine-type, and need to query them using
> > > query-hotpluggable-cpus.  
> > yep, that's true and it started with introduction of 'device_add cpu'
> > where libvirt didn't new what to specify as options for new cpu,
> > hence query-hotpluggable-cpus were added to provide that information.
> > 
> >   
> > > But if this assumption was really true, it would be impossible
> > > for the user to even decide how the NUMA topology will look like,
> > > wouldn't it?
> > > 
> > > Igor, are you able to give one example of how the user input
> > > (libvirt XML) for configuring NUMA CPU binding could look like if
> > > the user didn't know yet what the available sockets/cores/threads
> > > are?  
> > not sure I parse question but looking at libvirt's domain docs
> > it mentions
> >   <numa>
> >     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
> >     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
> >   </numa>
> > 
> > here libvirt assumes that there are cpus with cpu-index in range 0-7
> > /and probably duplicates logic that calculates cpu-index/
> > If libvirt would continue to duplicate logic we could skip on
> > implementing early runtime QMP in QEMU and also drop support for
> > query-hotpluggable-cpus as libvirt would be able to compute
> > properties/values on it's own.  
> 
> From the POV of the XML, these CPU numbers are *not* required to be
> the same as any QEMU CPU index. This is just saying that we've got
> a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
> and the second 4 in the second node. 
> 
> If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
> the XML POV, which uses 0-7 regardless. If there ever was such a
> disjoint representation of CPU indexes libvirt would have to remap
> whats in the XML to match whats in QEMU
that's what I'm saying, libvirt has to knows which cpu-indexes are valid
to use so it is able to build CLI which works:
  "-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1cpus=4-7"
and if algoritm that assigns cpu-indexes would change on QEMU side
it would break libvirt.

now to newer interface
  "-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1"
libvirt would had to know that socket-id and values 0-1 are valid,
now moving to spapr
  "-numa cpu,node-id=0,core-id=0 -numa cpu,node-id=1,core-id=8"
here valid values are not so obvious, core-id values are function
of "-smp"

this series was written so that mgmt won't have to duplicate logic
to match the same logic in qemu as libvirt didn't want to maintain
it, I'd assume because it's fragile. If libvirt would make up valid
properties/values on it's own we can forget about this series.

> Regards,
> Daniel


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Wed, Oct 18, 2017 at 05:24:12PM +0200, Igor Mammedov wrote:
> On Wed, 18 Oct 2017 15:49:36 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> > > not sure I parse question but looking at libvirt's domain docs
> > > it mentions
> > >   <numa>
> > >     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
> > >     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
> > >   </numa>
> > > 
> > > here libvirt assumes that there are cpus with cpu-index in range 0-7
> > > /and probably duplicates logic that calculates cpu-index/
> > > If libvirt would continue to duplicate logic we could skip on
> > > implementing early runtime QMP in QEMU and also drop support for
> > > query-hotpluggable-cpus as libvirt would be able to compute
> > > properties/values on it's own.  
> > 
> > From the POV of the XML, these CPU numbers are *not* required to be
> > the same as any QEMU CPU index. This is just saying that we've got
> > a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
> > and the second 4 in the second node. 
> > 
> > If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
> > the XML POV, which uses 0-7 regardless. If there ever was such a
> > disjoint representation of CPU indexes libvirt would have to remap
> > whats in the XML to match whats in QEMU
> that's what I'm saying, libvirt has to knows which cpu-indexes are valid
> to use so it is able to build CLI which works:
>   "-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1cpus=4-7"
> and if algoritm that assigns cpu-indexes would change on QEMU side
> it would break libvirt.

That's why I think QEMU should libvirt assign 'id' values to each
CPU, just like we do for other devices/object. That way QEMU can
have whatever CPU index numbering scheme it likes and it has no
effect on the mgmt app.

> now to newer interface
>   "-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1"
> libvirt would had to know that socket-id and values 0-1 are valid,
> now moving to spapr
>   "-numa cpu,node-id=0,core-id=0 -numa cpu,node-id=1,core-id=8"
> here valid values are not so obvious, core-id values are function
> of "-smp"
> 
> this series was written so that mgmt won't have to duplicate logic
> to match the same logic in qemu as libvirt didn't want to maintain
> it, I'd assume because it's fragile. If libvirt would make up valid
> properties/values on it's own we can forget about this series.

From libvirt POV we all we want to say is have N sockets, each with M
cores, each with O threads. That is architecture agnostic and what I
was trying to illustrate with my earlier proposed CLI syntax.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Wed, Oct 18, 2017 at 04:27:47PM +0100, Daniel P. Berrange wrote:
> On Wed, Oct 18, 2017 at 05:24:12PM +0200, Igor Mammedov wrote:
> > On Wed, 18 Oct 2017 15:49:36 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> > > > not sure I parse question but looking at libvirt's domain docs
> > > > it mentions
> > > >   <numa>
> > > >     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
> > > >     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
> > > >   </numa>
> > > > 
> > > > here libvirt assumes that there are cpus with cpu-index in range 0-7
> > > > /and probably duplicates logic that calculates cpu-index/
> > > > If libvirt would continue to duplicate logic we could skip on
> > > > implementing early runtime QMP in QEMU and also drop support for
> > > > query-hotpluggable-cpus as libvirt would be able to compute
> > > > properties/values on it's own.  
> > > 
> > > From the POV of the XML, these CPU numbers are *not* required to be
> > > the same as any QEMU CPU index. This is just saying that we've got
> > > a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
> > > and the second 4 in the second node. 
> > > 
> > > If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
> > > the XML POV, which uses 0-7 regardless. If there ever was such a
> > > disjoint representation of CPU indexes libvirt would have to remap
> > > whats in the XML to match whats in QEMU
> > that's what I'm saying, libvirt has to knows which cpu-indexes are valid
> > to use so it is able to build CLI which works:
> >   "-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1cpus=4-7"
> > and if algoritm that assigns cpu-indexes would change on QEMU side
> > it would break libvirt.
> 
> That's why I think QEMU should libvirt assign 'id' values to each
> CPU, just like we do for other devices/object. That way QEMU can
> have whatever CPU index numbering scheme it likes and it has no
> effect on the mgmt app.

Adding an intermediate ID doesn't seem to be address the problem
at all: you would still need to tell QEMU which
socket/core/thread combination correspond to which ID, and the
set of valid socket/core/thread IDs is defined by the
machine-type.

> 
> > now to newer interface
> >   "-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1"
> > libvirt would had to know that socket-id and values 0-1 are valid,
> > now moving to spapr
> >   "-numa cpu,node-id=0,core-id=0 -numa cpu,node-id=1,core-id=8"
> > here valid values are not so obvious, core-id values are function
> > of "-smp"
> > 
> > this series was written so that mgmt won't have to duplicate logic
> > to match the same logic in qemu as libvirt didn't want to maintain
> > it, I'd assume because it's fragile. If libvirt would make up valid
> > properties/values on it's own we can forget about this series.
> 
> From libvirt POV we all we want to say is have N sockets, each with M
> cores, each with O threads. That is architecture agnostic and what I
> was trying to illustrate with my earlier proposed CLI syntax.

The set of valid socket/core/thread IDs accepted by QEMU is
currently machine-dependent.  libvirt shouldn't expect them to be
architecture agnostic.

Defining architecture agnostic rules for them to avoid the need
for query-hotpluggable-cpus would still be a valid proposal, but
it needs to be written down instead of being just an implicit
assumption from the libvirt side.

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 16:07:59 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > which allows to pause QEMU before machine_init() is run and
> > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > NUMA mapping for cpus.    
> > > > 
> > > > What's the problem we're seeking solve here compared to what we currently
> > > > do for NUMA configuration ?  
> > > From RHBZ1382425
> > > "
> > > Current -numa CLI interface is quite limited in terms that allow map
> > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > are non obvious and depend on machine/arch. As result libvirt has to
> > > assume/re-implement cpu_index allocation logic to provide valid 
> > > values for -numa cpus=... QEMU CLI option.  
> > 
> > In broad terms, this problem applies to every device / object libvirt
> > asks QEMU to create. For everything else libvirt is able to assign a
> > "id" string, which is can then use to identify the thing later. The
> > CPU stuff is different because libvirt isn't able to provide 'id'
> > strings for each CPU - QEMU generates a psuedo-id internally which
> > libvirt has to infer. The latter is the same problem we had with
> > devices before '-device' was introduced allowing 'id' naming.
> > 
> > IMHO we should take the same approach with CPUs and start modelling 
> > the individual CPUs as something we can explicitly create with -object
> > or -device. That way libvirt can assign names and does not have to 
> > care about CPU index values, and it all works just the same way as
> > any other devices / object we create
> > 
> > ie instead of:
> > 
> >   -smp 8,sockets=4,cores=2,threads=1
> >   -numa node,nodeid=0,cpus=0-3
> >   -numa node,nodeid=1,cpus=4-7
> > 
> > we could do:
> > 
> >   -object numa-node,id=numa0
> >   -object numa-node,id=numa1
> >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> the follow up question would be where do "socket=3,core=1,thread=0"
> come from, currently these options are the function of
> (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> runtime after qemu parses -M and -smp options.

NB, I realize my example was open to mis-interpretation. The values I'm
illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
are a plain enumeration of values. ie this is saying the 4th socket, the
2nd core and the 1st thread.  Internally QEMU might have the 2nd core
with a core-id of 8, or 7038 or whatever architecture specific numbering
scheme makes sense, but that's not what the mgmt app gives at the CLI
level


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Wed, Oct 18, 2017 at 04:30:10PM +0100, Daniel P. Berrange wrote:
> On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 16:07:59 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >   
> > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > NUMA mapping for cpus.    
> > > > > 
> > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > do for NUMA configuration ?  
> > > > From RHBZ1382425
> > > > "
> > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > values for -numa cpus=... QEMU CLI option.  
> > > 
> > > In broad terms, this problem applies to every device / object libvirt
> > > asks QEMU to create. For everything else libvirt is able to assign a
> > > "id" string, which is can then use to identify the thing later. The
> > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > libvirt has to infer. The latter is the same problem we had with
> > > devices before '-device' was introduced allowing 'id' naming.
> > > 
> > > IMHO we should take the same approach with CPUs and start modelling 
> > > the individual CPUs as something we can explicitly create with -object
> > > or -device. That way libvirt can assign names and does not have to 
> > > care about CPU index values, and it all works just the same way as
> > > any other devices / object we create
> > > 
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > the follow up question would be where do "socket=3,core=1,thread=0"
> > come from, currently these options are the function of
> > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > runtime after qemu parses -M and -smp options.
> 
> NB, I realize my example was open to mis-interpretation. The values I'm
> illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> are a plain enumeration of values. ie this is saying the 4th socket, the
> 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> with a core-id of 8, or 7038 or whatever architecture specific numbering
> scheme makes sense, but that's not what the mgmt app gives at the CLI
> level

I believe we have been trying to avoid index numbers to identify
entities as a reaction to the bad experience we had with the
cpu_index/apic_id mess in the past.

An interface using arch-independent socket/core/thread indexes
(not arch-dependent IDs) like you propose in the paragraph above
could be a solution, as long as it is documented very clearly
(and we include automated testing for those constraints).  But
note that this is _not_ how the socket/core/thread IDs on the
"-device *-cpu" and -numa command-line options work today.

Also, this might solve the problem for CPU socket/core/thread
identification, but might not be enough for the messy device
address assignment rules that libvirt needs to duplicate in
src/qemu/qemu_domain_address.c today.

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by David Gibson 6 years, 6 months ago
On Wed, Oct 18, 2017 at 06:22:40PM -0200, Eduardo Habkost wrote:
> On Wed, Oct 18, 2017 at 04:30:10PM +0100, Daniel P. Berrange wrote:
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > 
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >   
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.    
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > do for NUMA configuration ?  
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > values for -numa cpus=... QEMU CLI option.  
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to 
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.
> > 
> > NB, I realize my example was open to mis-interpretation. The values I'm
> > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > are a plain enumeration of values. ie this is saying the 4th socket, the
> > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > level
> 
> I believe we have been trying to avoid index numbers to identify
> entities as a reaction to the bad experience we had with the
> cpu_index/apic_id mess in the past.
> 
> An interface using arch-independent socket/core/thread indexes
> (not arch-dependent IDs) like you propose in the paragraph above
> could be a solution, as long as it is documented very clearly
> (and we include automated testing for those constraints).  But
> note that this is _not_ how the socket/core/thread IDs on the
> "-device *-cpu" and -numa command-line options work today.
> 
> Also, this might solve the problem for CPU socket/core/thread
> identification, but might not be enough for the messy device
> address assignment rules that libvirt needs to duplicate in
> src/qemu/qemu_domain_address.c today.

Note that describing socket/core/thread tuples as arch independent (or
even machine independent) is.. debatable.  I mean it's flexible enough
that most platforms can be fit to that scheme without too much
straining.  But, there's no arch independent way of defining what each
level means in terms of its properties.

So, for example, on spapr - being paravirt - there's no real
distinction between cores and sockets, how you divide them up is
completely arbitrary.  I don't think we have any implemented, but it's
easy to imagine modelling a big server type machine with more than 3
natural layers of heirarchy (say, thread, core, chip,
multi-chip-module, big-honkin-drawer-of-processors, ...).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Paolo Bonzini 6 years, 6 months ago
On 19/10/2017 13:49, David Gibson wrote:
> Note that describing socket/core/thread tuples as arch independent (or
> even machine independent) is.. debatable.  I mean it's flexible enough
> that most platforms can be fit to that scheme without too much
> straining.  But, there's no arch independent way of defining what each
> level means in terms of its properties.
> 
> So, for example, on spapr - being paravirt - there's no real
> distinction between cores and sockets, how you divide them up is
> completely arbitrary.

Same on x86, actually.

It's _common_ that cores on the same socket share L3 cache and that a
socket spans an integer number of NUMA nodes, but it doesn't have to be
that way.

QEMU currently enforces the former (if it tells the guest at all that
there is an L3 cache), but not the latter.

Paolo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by David Gibson 6 years, 6 months ago
On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:
> On 19/10/2017 13:49, David Gibson wrote:
> > Note that describing socket/core/thread tuples as arch independent (or
> > even machine independent) is.. debatable.  I mean it's flexible enough
> > that most platforms can be fit to that scheme without too much
> > straining.  But, there's no arch independent way of defining what each
> > level means in terms of its properties.
> > 
> > So, for example, on spapr - being paravirt - there's no real
> > distinction between cores and sockets, how you divide them up is
> > completely arbitrary.
> 
> Same on x86, actually.
> 
> It's _common_ that cores on the same socket share L3 cache and that a
> socket spans an integer number of NUMA nodes, but it doesn't have to be
> that way.
> 
> QEMU currently enforces the former (if it tells the guest at all that
> there is an L3 cache), but not the latter.

Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
architecture in terms of this thread/core/socket heirarchy?  That's
not true for PAPR, where the NUMA topology is described in an
independent set of (potentially arbitrarily nested) nodes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:
> > On 19/10/2017 13:49, David Gibson wrote:
> > > Note that describing socket/core/thread tuples as arch independent (or
> > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > that most platforms can be fit to that scheme without too much
> > > straining.  But, there's no arch independent way of defining what each
> > > level means in terms of its properties.
> > > 
> > > So, for example, on spapr - being paravirt - there's no real
> > > distinction between cores and sockets, how you divide them up is
> > > completely arbitrary.
> > 
> > Same on x86, actually.
> > 
> > It's _common_ that cores on the same socket share L3 cache and that a
> > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > that way.
> > 
> > QEMU currently enforces the former (if it tells the guest at all that
> > there is an L3 cache), but not the latter.
> 
> Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> architecture in terms of this thread/core/socket heirarchy?  That's
> not true for PAPR, where the NUMA topology is described in an
> independent set of (potentially arbitrarily nested) nodes.

On PC, ACPI NUMA information only refer to CPU APIC IDs, which
identify individual CPU threads; it doesn't care about CPU
socket/core/thread topology.  If I'm not mistaken, the
socket/core/thread topology is not represented in ACPI at all.

Some guest OSes, however, may get very confused if they see an
unexpected NUMA/CPU topology.  IIRC, it was possible to make old
Linux kernel versions panic by generating a weird topology.

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 5 months ago
On Fri, 20 Oct 2017 17:53:09 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > that most platforms can be fit to that scheme without too much
> > > > straining.  But, there's no arch independent way of defining what each
> > > > level means in terms of its properties.
> > > > 
> > > > So, for example, on spapr - being paravirt - there's no real
> > > > distinction between cores and sockets, how you divide them up is
> > > > completely arbitrary.  
> > > 
> > > Same on x86, actually.
> > > 
> > > It's _common_ that cores on the same socket share L3 cache and that a
> > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > that way.
> > > 
> > > QEMU currently enforces the former (if it tells the guest at all that
> > > there is an L3 cache), but not the latter.  
> > 
> > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > architecture in terms of this thread/core/socket heirarchy?  That's
> > not true for PAPR, where the NUMA topology is described in an
> > independent set of (potentially arbitrarily nested) nodes.  
> 
> On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> identify individual CPU threads; it doesn't care about CPU
> socket/core/thread topology.  If I'm not mistaken, the
> socket/core/thread topology is not represented in ACPI at all.
> 
> Some guest OSes, however, may get very confused if they see an
> unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> Linux kernel versions panic by generating a weird topology.
It doesn't mean that it's right thing to do random mapping.
Even if it doesn't crash linux guest anymore, it might have
performance implications on running guest. I'd assume it
outweighs a 1 restart configure cost at domain xml creation time,
the rest of the times vm is started it can reuse cached options.

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 5 months ago
On Fri, 20 Oct 2017 17:53:09 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > that most platforms can be fit to that scheme without too much
> > > > straining.  But, there's no arch independent way of defining what each
> > > > level means in terms of its properties.
> > > > 
> > > > So, for example, on spapr - being paravirt - there's no real
> > > > distinction between cores and sockets, how you divide them up is
> > > > completely arbitrary.  
> > > 
> > > Same on x86, actually.
> > > 
> > > It's _common_ that cores on the same socket share L3 cache and that a
> > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > that way.
> > > 
> > > QEMU currently enforces the former (if it tells the guest at all that
> > > there is an L3 cache), but not the latter.  
> > 
> > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > architecture in terms of this thread/core/socket heirarchy?  That's
> > not true for PAPR, where the NUMA topology is described in an
> > independent set of (potentially arbitrarily nested) nodes.  
> 
> On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> identify individual CPU threads; it doesn't care about CPU
> socket/core/thread topology.  If I'm not mistaken, the
> socket/core/thread topology is not represented in ACPI at all.
ACPI does node mapping per logical cpu (thread) in SRAT table,
so virtually we are able to describe insane configurations.
That however doesn't mean that we should go outside of
what real hw does and confuse guest which may have certain
expectations.

Currently for x86 expectations are that cpus are mapped to numa
nodes either by whole cores or whole sockets (AMD and Intel cpus
respectively). In future it might change.


> Some guest OSes, however, may get very confused if they see an
> unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> Linux kernel versions panic by generating a weird topology.

There where bugs that where fixed on QEMU or guest kernel side
when unexpected mapping were present. While we can 'fix' guest
expectation in linux kernel it might be not possible for other
OSes one more reason we shouldn't allow blind assignment by mgmt.


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 5 months ago
On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 17:53:09 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > > that most platforms can be fit to that scheme without too much
> > > > > straining.  But, there's no arch independent way of defining what each
> > > > > level means in terms of its properties.
> > > > > 
> > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > distinction between cores and sockets, how you divide them up is
> > > > > completely arbitrary.  
> > > > 
> > > > Same on x86, actually.
> > > > 
> > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > > that way.
> > > > 
> > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > there is an L3 cache), but not the latter.  
> > > 
> > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > not true for PAPR, where the NUMA topology is described in an
> > > independent set of (potentially arbitrarily nested) nodes.  
> > 
> > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > identify individual CPU threads; it doesn't care about CPU
> > socket/core/thread topology.  If I'm not mistaken, the
> > socket/core/thread topology is not represented in ACPI at all.
> ACPI does node mapping per logical cpu (thread) in SRAT table,
> so virtually we are able to describe insane configurations.
> That however doesn't mean that we should go outside of
> what real hw does and confuse guest which may have certain
> expectations.

Agreed.

> 
> Currently for x86 expectations are that cpus are mapped to numa
> nodes either by whole cores or whole sockets (AMD and Intel cpus
> respectively). In future it might change.
> 
> 
> > Some guest OSes, however, may get very confused if they see an
> > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > Linux kernel versions panic by generating a weird topology.
> 
> There where bugs that where fixed on QEMU or guest kernel side
> when unexpected mapping were present. While we can 'fix' guest
> expectation in linux kernel it might be not possible for other
> OSes one more reason we shouldn't allow blind assignment by mgmt.

One problem with blocking arbitrary assignment is the possibility
of breaking existing VM configurations.  We could enforce the new
rules only on newer machine-types, although this means an
existing VM configuration may stop being runnable after updating
the machine-type.

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 5 months ago
On Wed, Oct 25, 2017 at 08:57:43AM +0200, Eduardo Habkost wrote:
> On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> > On Fri, 20 Oct 2017 17:53:09 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > > > that most platforms can be fit to that scheme without too much
> > > > > > straining.  But, there's no arch independent way of defining what each
> > > > > > level means in terms of its properties.
> > > > > > 
> > > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > > distinction between cores and sockets, how you divide them up is
> > > > > > completely arbitrary.  
> > > > > 
> > > > > Same on x86, actually.
> > > > > 
> > > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > > > that way.
> > > > > 
> > > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > > there is an L3 cache), but not the latter.  
> > > > 
> > > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > > not true for PAPR, where the NUMA topology is described in an
> > > > independent set of (potentially arbitrarily nested) nodes.  
> > > 
> > > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > > identify individual CPU threads; it doesn't care about CPU
> > > socket/core/thread topology.  If I'm not mistaken, the
> > > socket/core/thread topology is not represented in ACPI at all.
> > ACPI does node mapping per logical cpu (thread) in SRAT table,
> > so virtually we are able to describe insane configurations.
> > That however doesn't mean that we should go outside of
> > what real hw does and confuse guest which may have certain
> > expectations.
> 
> Agreed.
> 
> > 
> > Currently for x86 expectations are that cpus are mapped to numa
> > nodes either by whole cores or whole sockets (AMD and Intel cpus
> > respectively). In future it might change.
> > 
> > 
> > > Some guest OSes, however, may get very confused if they see an
> > > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > > Linux kernel versions panic by generating a weird topology.
> > 
> > There where bugs that where fixed on QEMU or guest kernel side
> > when unexpected mapping were present. While we can 'fix' guest
> > expectation in linux kernel it might be not possible for other
> > OSes one more reason we shouldn't allow blind assignment by mgmt.
> 
> One problem with blocking arbitrary assignment is the possibility
> of breaking existing VM configurations.  We could enforce the new
> rules only on newer machine-types, although this means an
> existing VM configuration may stop being runnable after updating
> the machine-type.

We should also be wary of blocking something just because some guest OS
are unhappy. Other guest OS may be perfectly OK with the configuration
and shouldn't be prevented from using it if their admin wants it.

IOW, we should only consider blocking things that are disallowed
by relevant specs, or would impose functional or security problems
in the host. If it is merely that some guest OS are unhappy with
certain configs, that's just a docs problem (eg Windows won't use
more than 2 sockets in many versions, but we shouldn't block use
of more than 2 sockets of course).


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 5 months ago
On Wed, Oct 25, 2017 at 08:02:06AM +0100, Daniel P. Berrange wrote:
> On Wed, Oct 25, 2017 at 08:57:43AM +0200, Eduardo Habkost wrote:
> > On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> > > On Fri, 20 Oct 2017 17:53:09 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > 
> > > > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > > > > that most platforms can be fit to that scheme without too much
> > > > > > > straining.  But, there's no arch independent way of defining what each
> > > > > > > level means in terms of its properties.
> > > > > > > 
> > > > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > > > distinction between cores and sockets, how you divide them up is
> > > > > > > completely arbitrary.  
> > > > > > 
> > > > > > Same on x86, actually.
> > > > > > 
> > > > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > > > > that way.
> > > > > > 
> > > > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > > > there is an L3 cache), but not the latter.  
> > > > > 
> > > > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > > > not true for PAPR, where the NUMA topology is described in an
> > > > > independent set of (potentially arbitrarily nested) nodes.  
> > > > 
> > > > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > > > identify individual CPU threads; it doesn't care about CPU
> > > > socket/core/thread topology.  If I'm not mistaken, the
> > > > socket/core/thread topology is not represented in ACPI at all.
> > > ACPI does node mapping per logical cpu (thread) in SRAT table,
> > > so virtually we are able to describe insane configurations.
> > > That however doesn't mean that we should go outside of
> > > what real hw does and confuse guest which may have certain
> > > expectations.
> > 
> > Agreed.
> > 
> > > 
> > > Currently for x86 expectations are that cpus are mapped to numa
> > > nodes either by whole cores or whole sockets (AMD and Intel cpus
> > > respectively). In future it might change.
> > > 
> > > 
> > > > Some guest OSes, however, may get very confused if they see an
> > > > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > > > Linux kernel versions panic by generating a weird topology.
> > > 
> > > There where bugs that where fixed on QEMU or guest kernel side
> > > when unexpected mapping were present. While we can 'fix' guest
> > > expectation in linux kernel it might be not possible for other
> > > OSes one more reason we shouldn't allow blind assignment by mgmt.
> > 
> > One problem with blocking arbitrary assignment is the possibility
> > of breaking existing VM configurations.  We could enforce the new
> > rules only on newer machine-types, although this means an
> > existing VM configuration may stop being runnable after updating
> > the machine-type.
> 
> We should also be wary of blocking something just because some guest OS
> are unhappy. Other guest OS may be perfectly OK with the configuration
> and shouldn't be prevented from using it if their admin wants it.
> 
> IOW, we should only consider blocking things that are disallowed
> by relevant specs, or would impose functional or security problems
> in the host. If it is merely that some guest OS are unhappy with
> certain configs, that's just a docs problem (eg Windows won't use
> more than 2 sockets in many versions, but we shouldn't block use
> of more than 2 sockets of course).

I agree with this for things that only some guests are unhappy
with, but I'm wary of allowing something that is not expected to
work on any guest and is known to cause issues just because it's
not forbidden by the spec.  Supporting things that are actually
useful and supported by guest OSes is already hard enough.

For new features, I'd rather be conservative and allow only
configurations that are expected to work.  We can always update
QEMU later to allow something that wasn't allowed before.

For existing features like thread-level NUMA binding, the best
solution is not always obvious because people may be relying on
them.

-- 
Eduardo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
----- Original Message -----
> From: "Daniel P. Berrange" <berrange@redhat.com>
> To: "Igor Mammedov" <imammedo@redhat.com>
> Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> Sent: Wednesday, October 18, 2017 5:30:10 PM
> Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> 
> On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 16:07:59 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >   
> > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > option
> > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > NUMA mapping for cpus.
> > > > > 
> > > > > What's the problem we're seeking solve here compared to what we
> > > > > currently
> > > > > do for NUMA configuration ?
> > > > From RHBZ1382425
> > > > "
> > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > values for -numa cpus=... QEMU CLI option.
> > > 
> > > In broad terms, this problem applies to every device / object libvirt
> > > asks QEMU to create. For everything else libvirt is able to assign a
> > > "id" string, which is can then use to identify the thing later. The
> > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > libvirt has to infer. The latter is the same problem we had with
> > > devices before '-device' was introduced allowing 'id' naming.
> > > 
> > > IMHO we should take the same approach with CPUs and start modelling
> > > the individual CPUs as something we can explicitly create with -object
> > > or -device. That way libvirt can assign names and does not have to
> > > care about CPU index values, and it all works just the same way as
> > > any other devices / object we create
> > > 
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > the follow up question would be where do "socket=3,core=1,thread=0"
> > come from, currently these options are the function of
> > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > runtime after qemu parses -M and -smp options.
> 
> NB, I realize my example was open to mis-interpretation. The values I'm
> illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> are a plain enumeration of values. ie this is saying the 4th socket, the
> 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> with a core-id of 8, or 7038 or whatever architecture specific numbering
> scheme makes sense, but that's not what the mgmt app gives at the CLI
> level
Even though fixed properties/values simplicity is tempting and it might even
work for what we have implemented in qemu currently (well, SPAPR will need
refactoring (if possible) to meet requirements + compat stuff for current
machines with sparse IDs).
But I have to disagree here and try to oppose it.

QEMU models concrete platforms/hw with certain non abstract properties
and it's libvirt's domain to translate platform specific devices into
'spherical' devices with abstract properties.

Now back to cpus and suggestion to fix the set of 'address' properties
and their values into continuous enumeration range [0..N). That would
  1. put a burden of hiding platform/device details on QEMU
      (which is already bad as QEMU's job is to emulate it)
  2. with abstract 'address' properties and values, user won't have
     a clue as to where device is being attached (as qemu would magically
     remap that to fit specific machine needs)
  2.1. if abstract 'address' properties and values we can do away with
     socket/core/thread/whatnot since they won't mean the same when considered
     from platform point of view, so we can just drop all these nonsense
     and go back to cpu-index that has all the properties you've suggested
     /abstract, [0..N]/.
  3. we currently stopped with socket|core|thread-id properties as they are
     applicable to machines that support -device cpu, but it's up to machine
     to pick witch of these to use (x86: uses all, spar: uses core-id only),
     but current property set is open for extension if need arises without
     need to redefine interface. So fixed list of properties [even ignoring
     values impact] doesn't scale.

We even have cpu-add command which takes cpu-index as argument and
-numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
where and if it makes any sense from platform point of view.

That's why when designing hot plug for 'device_add cpu' interface, we ended up
with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
for hot-plug:

Approach allows 
   1: machine to publish properties/values that make sense from emulated
      platform point of view but still understandable by user of given hw.
   2: user may use them as opaque mandatory properties to create cpu device if
      he/she doesn't care about where it's plugged.
   3: if user cares about which cpu goes where, properties defined by machine
      provide that info from emulated hw point of view including platform specific
      details.
   4: it's easy to extend set of properties/values if need arises without
      breaking users (provided user will put them all in -device/device_add
      options as it's supposed to)

But current approach has drawback, to call query-hotpluggble-cpus, machine has to
be started first, which is fine for hot plug but not for specifying CLI options.

Currently that could be solved by starting qemu twice when 'defining domain',
where on the first run mgmt queries board layout and caches it for all the next
times the defined machine is started (change in machine/version/-smp/-cpu will
invalidate, cache).

This series allows to avoid this 1st time restart, when creating domain for
the first time, mgmt can query layout and then specify numa mapping without
restarting, it can cache defined mapping as commands exactly match corresponding
CLI options and reuse cached options on the next domain starts.

This approach could be extended further with "device_add cpu" command
so it would be possible to start qemu with -smp 0,... and allow mgmt to
create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
these commands and reuse them on CLI next time machine is started.

I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
but working to the same goal to allow mgmt discover which hw is provided by
specific machine and where/which hw could be plugged (like which slot supports
which kind of device and which 'address' should be used to attach device
(socket|core... - for cpus, bus/function - for pic, ...)
 
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> |:|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com
> |:|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange
> |:|
> 

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> ----- Original Message -----
> > From: "Daniel P. Berrange" <berrange@redhat.com>
> > To: "Igor Mammedov" <imammedo@redhat.com>
> > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > 
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > 
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >   
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > currently
> > > > > > do for NUMA configuration ?
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > values for -numa cpus=... QEMU CLI option.
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.
> > 
> > NB, I realize my example was open to mis-interpretation. The values I'm
> > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > are a plain enumeration of values. ie this is saying the 4th socket, the
> > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > level
> Even though fixed properties/values simplicity is tempting and it might even
> work for what we have implemented in qemu currently (well, SPAPR will need
> refactoring (if possible) to meet requirements + compat stuff for current
> machines with sparse IDs).
> But I have to disagree here and try to oppose it.
> 
> QEMU models concrete platforms/hw with certain non abstract properties
> and it's libvirt's domain to translate platform specific devices into
> 'spherical' devices with abstract properties.
> 
> Now back to cpus and suggestion to fix the set of 'address' properties
> and their values into continuous enumeration range [0..N). That would
>   1. put a burden of hiding platform/device details on QEMU
>       (which is already bad as QEMU's job is to emulate it)
>   2. with abstract 'address' properties and values, user won't have
>      a clue as to where device is being attached (as qemu would magically
>      remap that to fit specific machine needs)
>   2.1. if abstract 'address' properties and values we can do away with
>      socket/core/thread/whatnot since they won't mean the same when considered
>      from platform point of view, so we can just drop all these nonsense
>      and go back to cpu-index that has all the properties you've suggested
>      /abstract, [0..N]/.
>   3. we currently stopped with socket|core|thread-id properties as they are
>      applicable to machines that support -device cpu, but it's up to machine
>      to pick witch of these to use (x86: uses all, spar: uses core-id only),
>      but current property set is open for extension if need arises without
>      need to redefine interface. So fixed list of properties [even ignoring
>      values impact] doesn't scale.

Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
guest XML, we just provide an overall count of sockets/cores/threads which is
portable. The only arch specific thing we would have todo is express constraints
about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
threads per core for example.

> We even have cpu-add command which takes cpu-index as argument and
> -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> where and if it makes any sense from platform point of view.
> 
> That's why when designing hot plug for 'device_add cpu' interface, we ended up
> with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> for hot-plug:
> 
> Approach allows 
>    1: machine to publish properties/values that make sense from emulated
>       platform point of view but still understandable by user of given hw.
>    2: user may use them as opaque mandatory properties to create cpu device if
>       he/she doesn't care about where it's plugged.
>    3: if user cares about which cpu goes where, properties defined by machine
>       provide that info from emulated hw point of view including platform specific
>       details.
>    4: it's easy to extend set of properties/values if need arises without
>       breaking users (provided user will put them all in -device/device_add
>       options as it's supposed to)
> 
> But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> be started first, which is fine for hot plug but not for specifying CLI options.
> 
> Currently that could be solved by starting qemu twice when 'defining domain',
> where on the first run mgmt queries board layout and caches it for all the next
> times the defined machine is started (change in machine/version/-smp/-cpu will
> invalidate, cache).
> 
> This series allows to avoid this 1st time restart, when creating domain for
> the first time, mgmt can query layout and then specify numa mapping without
> restarting, it can cache defined mapping as commands exactly match corresponding
> CLI options and reuse cached options on the next domain starts.
> 
> This approach could be extended further with "device_add cpu" command
> so it would be possible to start qemu with -smp 0,... and allow mgmt to
> create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> these commands and reuse them on CLI next time machine is started
> 
> I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> but working to the same goal to allow mgmt discover which hw is provided by
> specific machine and where/which hw could be plugged (like which slot supports
> which kind of device and which 'address' should be used to attach device
> (socket|core... - for cpus, bus/function - for pic, ...)

As mentioned elsewhere in the thread, the approach of defining the VM config
incrementally via the monitor has significant downsides, by making the config
invisible in any logs of the ARGV, and has likely performance impact when
starting up QEMU, particularly if it is used for more things going forward. To
me these downsides are enough to make the suggested approach for CPUs impractical
for libvirt to use.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt] [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > ----- Original Message -----
> > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > 
> > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > 
> > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >   
> > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > option
> > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > NUMA mapping for cpus.
> > > > > > > 
> > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > currently
> > > > > > > do for NUMA configuration ?
> > > > > > From RHBZ1382425
> > > > > > "
> > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > 
> > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > "id" string, which is can then use to identify the thing later. The
> > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > 
> > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > the individual CPUs as something we can explicitly create with -object
> > > > > or -device. That way libvirt can assign names and does not have to
> > > > > care about CPU index values, and it all works just the same way as
> > > > > any other devices / object we create
> > > > > 
> > > > > ie instead of:
> > > > > 
> > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > >   -numa node,nodeid=0,cpus=0-3
> > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > 
> > > > > we could do:
> > > > > 
> > > > >   -object numa-node,id=numa0
> > > > >   -object numa-node,id=numa1
> > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > come from, currently these options are the function of
> > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > runtime after qemu parses -M and -smp options.
> > > 
> > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > level
> > Even though fixed properties/values simplicity is tempting and it might even
> > work for what we have implemented in qemu currently (well, SPAPR will need
> > refactoring (if possible) to meet requirements + compat stuff for current
> > machines with sparse IDs).
> > But I have to disagree here and try to oppose it.
> > 
> > QEMU models concrete platforms/hw with certain non abstract properties
> > and it's libvirt's domain to translate platform specific devices into
> > 'spherical' devices with abstract properties.
> > 
> > Now back to cpus and suggestion to fix the set of 'address' properties
> > and their values into continuous enumeration range [0..N). That would
> >   1. put a burden of hiding platform/device details on QEMU
> >       (which is already bad as QEMU's job is to emulate it)
> >   2. with abstract 'address' properties and values, user won't have
> >      a clue as to where device is being attached (as qemu would magically
> >      remap that to fit specific machine needs)
> >   2.1. if abstract 'address' properties and values we can do away with
> >      socket/core/thread/whatnot since they won't mean the same when considered
> >      from platform point of view, so we can just drop all these nonsense
> >      and go back to cpu-index that has all the properties you've suggested
> >      /abstract, [0..N]/.
> >   3. we currently stopped with socket|core|thread-id properties as they are
> >      applicable to machines that support -device cpu, but it's up to machine
> >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> >      but current property set is open for extension if need arises without
> >      need to redefine interface. So fixed list of properties [even ignoring
> >      values impact] doesn't scale.
> 
> Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> guest XML, we just provide an overall count of sockets/cores/threads which is
> portable. The only arch specific thing we would have todo is express constraints
> about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> threads per core for example.
> 
> > We even have cpu-add command which takes cpu-index as argument and
> > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > where and if it makes any sense from platform point of view.
> > 
> > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > for hot-plug:
> > 
> > Approach allows 
> >    1: machine to publish properties/values that make sense from emulated
> >       platform point of view but still understandable by user of given hw.
> >    2: user may use them as opaque mandatory properties to create cpu device if
> >       he/she doesn't care about where it's plugged.
> >    3: if user cares about which cpu goes where, properties defined by machine
> >       provide that info from emulated hw point of view including platform specific
> >       details.
> >    4: it's easy to extend set of properties/values if need arises without
> >       breaking users (provided user will put them all in -device/device_add
> >       options as it's supposed to)
> > 
> > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > be started first, which is fine for hot plug but not for specifying CLI options.
> > 
> > Currently that could be solved by starting qemu twice when 'defining domain',
> > where on the first run mgmt queries board layout and caches it for all the next
> > times the defined machine is started (change in machine/version/-smp/-cpu will
> > invalidate, cache).
> > 
> > This series allows to avoid this 1st time restart, when creating domain for
> > the first time, mgmt can query layout and then specify numa mapping without
> > restarting, it can cache defined mapping as commands exactly match corresponding
> > CLI options and reuse cached options on the next domain starts.
> > 
> > This approach could be extended further with "device_add cpu" command
> > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > these commands and reuse them on CLI next time machine is started
> > 
> > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > but working to the same goal to allow mgmt discover which hw is provided by
> > specific machine and where/which hw could be plugged (like which slot supports
> > which kind of device and which 'address' should be used to attach device
> > (socket|core... - for cpus, bus/function - for pic, ...)
> 
> As mentioned elsewhere in the thread, the approach of defining the VM config
> incrementally via the monitor has significant downsides, by making the config
> invisible in any logs of the ARGV, and has likely performance impact when
> starting up QEMU, particularly if it is used for more things going forward. To
> me these downsides are enough to make the suggested approach for CPUs impractical
> for libvirt to use.

Those downsides do exist, but we should weight them against the
downsides of not allowing any information at all to flow from
QEMU to libvirt when starting a VM.

I believe the code in libvirt/src/qemu/qemu_domain_address.c is
a good illustration of those downsides.

-- 
Eduardo

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > > ----- Original Message -----
> > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > 
> > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > 
> > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >   
> > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > option
> > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > NUMA mapping for cpus.
> > > > > > > > 
> > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > currently
> > > > > > > > do for NUMA configuration ?
> > > > > > > From RHBZ1382425
> > > > > > > "
> > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > > 
> > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > 
> > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > care about CPU index values, and it all works just the same way as
> > > > > > any other devices / object we create
> > > > > > 
> > > > > > ie instead of:
> > > > > > 
> > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > 
> > > > > > we could do:
> > > > > > 
> > > > > >   -object numa-node,id=numa0
> > > > > >   -object numa-node,id=numa1
> > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > come from, currently these options are the function of
> > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > runtime after qemu parses -M and -smp options.
> > > > 
> > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > level
> > > Even though fixed properties/values simplicity is tempting and it might even
> > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > refactoring (if possible) to meet requirements + compat stuff for current
> > > machines with sparse IDs).
> > > But I have to disagree here and try to oppose it.
> > > 
> > > QEMU models concrete platforms/hw with certain non abstract properties
> > > and it's libvirt's domain to translate platform specific devices into
> > > 'spherical' devices with abstract properties.
> > > 
> > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > and their values into continuous enumeration range [0..N). That would
> > >   1. put a burden of hiding platform/device details on QEMU
> > >       (which is already bad as QEMU's job is to emulate it)
> > >   2. with abstract 'address' properties and values, user won't have
> > >      a clue as to where device is being attached (as qemu would magically
> > >      remap that to fit specific machine needs)
> > >   2.1. if abstract 'address' properties and values we can do away with
> > >      socket/core/thread/whatnot since they won't mean the same when considered
> > >      from platform point of view, so we can just drop all these nonsense
> > >      and go back to cpu-index that has all the properties you've suggested
> > >      /abstract, [0..N]/.
> > >   3. we currently stopped with socket|core|thread-id properties as they are
> > >      applicable to machines that support -device cpu, but it's up to machine
> > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > >      but current property set is open for extension if need arises without
> > >      need to redefine interface. So fixed list of properties [even ignoring
> > >      values impact] doesn't scale.
> > 
> > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > guest XML, we just provide an overall count of sockets/cores/threads which is
> > portable. The only arch specific thing we would have todo is express constraints
> > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > threads per core for example.
> > 
> > > We even have cpu-add command which takes cpu-index as argument and
> > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > where and if it makes any sense from platform point of view.
> > > 
> > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > for hot-plug:
> > > 
> > > Approach allows 
> > >    1: machine to publish properties/values that make sense from emulated
> > >       platform point of view but still understandable by user of given hw.
> > >    2: user may use them as opaque mandatory properties to create cpu device if
> > >       he/she doesn't care about where it's plugged.
> > >    3: if user cares about which cpu goes where, properties defined by machine
> > >       provide that info from emulated hw point of view including platform specific
> > >       details.
> > >    4: it's easy to extend set of properties/values if need arises without
> > >       breaking users (provided user will put them all in -device/device_add
> > >       options as it's supposed to)
> > > 
> > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > 
> > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > where on the first run mgmt queries board layout and caches it for all the next
> > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > invalidate, cache).
> > > 
> > > This series allows to avoid this 1st time restart, when creating domain for
> > > the first time, mgmt can query layout and then specify numa mapping without
> > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > CLI options and reuse cached options on the next domain starts.
> > > 
> > > This approach could be extended further with "device_add cpu" command
> > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > these commands and reuse them on CLI next time machine is started
> > > 
> > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > but working to the same goal to allow mgmt discover which hw is provided by
> > > specific machine and where/which hw could be plugged (like which slot supports
> > > which kind of device and which 'address' should be used to attach device
> > > (socket|core... - for cpus, bus/function - for pic, ...)
> > 
> > As mentioned elsewhere in the thread, the approach of defining the VM config
> > incrementally via the monitor has significant downsides, by making the config
> > invisible in any logs of the ARGV, and has likely performance impact when
> > starting up QEMU, particularly if it is used for more things going forward. To
> > me these downsides are enough to make the suggested approach for CPUs impractical
> > for libvirt to use.
> 
> Those downsides do exist, but we should weight them against the
> downsides of not allowing any information at all to flow from
> QEMU to libvirt when starting a VM.
> 
> I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> a good illustration of those downsides.

Right, but for this NUMA / CPU scenario I don't think we're going to end up
with complexity like this. I still believe we are able to come up with a
way to represent it at the CLI without so much architecture specific
knowledge.

Even if that is not possible though, from libvirt POV the extra complexity
is worth it, if that is what we need to preserve fast startup time. The
time to start a guest is very important to apps like libguestfs and libvirt
sandbox, so going down a direction which is likely to add 100's or even 1000's
of milliseconds to the startup time is not desirable, even if it makes libvirt
simpler

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Eduardo Habkost 6 years, 6 months ago
On Fri, Oct 20, 2017 at 10:07:27AM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > > > ----- Original Message -----
> > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > 
> > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > 
> > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > >   
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > option
> > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > NUMA mapping for cpus.
> > > > > > > > > 
> > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > currently
> > > > > > > > > do for NUMA configuration ?
> > > > > > > > From RHBZ1382425
> > > > > > > > "
> > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > > > 
> > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > 
> > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > any other devices / object we create
> > > > > > > 
> > > > > > > ie instead of:
> > > > > > > 
> > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > 
> > > > > > > we could do:
> > > > > > > 
> > > > > > >   -object numa-node,id=numa0
> > > > > > >   -object numa-node,id=numa1
> > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > come from, currently these options are the function of
> > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > runtime after qemu parses -M and -smp options.
> > > > > 
> > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > level
> > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > machines with sparse IDs).
> > > > But I have to disagree here and try to oppose it.
> > > > 
> > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > and it's libvirt's domain to translate platform specific devices into
> > > > 'spherical' devices with abstract properties.
> > > > 
> > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > and their values into continuous enumeration range [0..N). That would
> > > >   1. put a burden of hiding platform/device details on QEMU
> > > >       (which is already bad as QEMU's job is to emulate it)
> > > >   2. with abstract 'address' properties and values, user won't have
> > > >      a clue as to where device is being attached (as qemu would magically
> > > >      remap that to fit specific machine needs)
> > > >   2.1. if abstract 'address' properties and values we can do away with
> > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > >      from platform point of view, so we can just drop all these nonsense
> > > >      and go back to cpu-index that has all the properties you've suggested
> > > >      /abstract, [0..N]/.
> > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > >      applicable to machines that support -device cpu, but it's up to machine
> > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > >      but current property set is open for extension if need arises without
> > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > >      values impact] doesn't scale.
> > > 
> > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > portable. The only arch specific thing we would have todo is express constraints
> > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > threads per core for example.
> > > 
> > > > We even have cpu-add command which takes cpu-index as argument and
> > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > where and if it makes any sense from platform point of view.
> > > > 
> > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > for hot-plug:
> > > > 
> > > > Approach allows 
> > > >    1: machine to publish properties/values that make sense from emulated
> > > >       platform point of view but still understandable by user of given hw.
> > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > >       he/she doesn't care about where it's plugged.
> > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > >       provide that info from emulated hw point of view including platform specific
> > > >       details.
> > > >    4: it's easy to extend set of properties/values if need arises without
> > > >       breaking users (provided user will put them all in -device/device_add
> > > >       options as it's supposed to)
> > > > 
> > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > 
> > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > invalidate, cache).
> > > > 
> > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > CLI options and reuse cached options on the next domain starts.
> > > > 
> > > > This approach could be extended further with "device_add cpu" command
> > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > these commands and reuse them on CLI next time machine is started
> > > > 
> > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > which kind of device and which 'address' should be used to attach device
> > > > (socket|core... - for cpus, bus/function - for pic, ...)
> > > 
> > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > incrementally via the monitor has significant downsides, by making the config
> > > invisible in any logs of the ARGV, and has likely performance impact when
> > > starting up QEMU, particularly if it is used for more things going forward. To
> > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > for libvirt to use.
> > 
> > Those downsides do exist, but we should weight them against the
> > downsides of not allowing any information at all to flow from
> > QEMU to libvirt when starting a VM.
> > 
> > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > a good illustration of those downsides.
> 
> Right, but for this NUMA / CPU scenario I don't think we're going to end up
> with complexity like this. I still believe we are able to come up with a
> way to represent it at the CLI without so much architecture specific
> knowledge.

In the case of NUMA/CPU, I'm inclined to agree.


> 
> Even if that is not possible though, from libvirt POV the extra complexity
> is worth it, if that is what we need to preserve fast startup time. The
> time to start a guest is very important to apps like libguestfs and libvirt
> sandbox, so going down a direction which is likely to add 100's or even 1000's
> of milliseconds to the startup time is not desirable, even if it makes libvirt
> simpler

I don't believe this is likely to add 100's or 1000's of
milliseconds to startup time, but I agree we need to keep an eye
on startup time while introducing new interfaces.

-- 
Eduardo

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 5 months ago
On Fri, 20 Oct 2017 18:07:03 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 10:07:27AM +0100, Daniel P. Berrange wrote:
> > On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:  
> > > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:  
> > > > > ----- Original Message -----  
> > > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > > 
> > > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >   
> > > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > > >     
> > > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > > option
> > > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > > NUMA mapping for cpus.  
> > > > > > > > > > 
> > > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > > currently
> > > > > > > > > > do for NUMA configuration ?  
> > > > > > > > > From RHBZ1382425
> > > > > > > > > "
> > > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > > values for -numa cpus=... QEMU CLI option.  
> > > > > > > > 
> > > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > > 
> > > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > > any other devices / object we create
> > > > > > > > 
> > > > > > > > ie instead of:
> > > > > > > > 
> > > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > > 
> > > > > > > > we could do:
> > > > > > > > 
> > > > > > > >   -object numa-node,id=numa0
> > > > > > > >   -object numa-node,id=numa1
> > > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > > come from, currently these options are the function of
> > > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > > runtime after qemu parses -M and -smp options.  
> > > > > > 
> > > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > > level  
> > > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > > machines with sparse IDs).
> > > > > But I have to disagree here and try to oppose it.
> > > > > 
> > > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > > and it's libvirt's domain to translate platform specific devices into
> > > > > 'spherical' devices with abstract properties.
> > > > > 
> > > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > > and their values into continuous enumeration range [0..N). That would
> > > > >   1. put a burden of hiding platform/device details on QEMU
> > > > >       (which is already bad as QEMU's job is to emulate it)
> > > > >   2. with abstract 'address' properties and values, user won't have
> > > > >      a clue as to where device is being attached (as qemu would magically
> > > > >      remap that to fit specific machine needs)
> > > > >   2.1. if abstract 'address' properties and values we can do away with
> > > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > > >      from platform point of view, so we can just drop all these nonsense
> > > > >      and go back to cpu-index that has all the properties you've suggested
> > > > >      /abstract, [0..N]/.
> > > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > > >      applicable to machines that support -device cpu, but it's up to machine
> > > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > > >      but current property set is open for extension if need arises without
> > > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > > >      values impact] doesn't scale.  
> > > > 
> > > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > > portable. The only arch specific thing we would have todo is express constraints
> > > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > > threads per core for example.
> > > >   
> > > > > We even have cpu-add command which takes cpu-index as argument and
> > > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > > where and if it makes any sense from platform point of view.
> > > > > 
> > > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > > for hot-plug:
> > > > > 
> > > > > Approach allows 
> > > > >    1: machine to publish properties/values that make sense from emulated
> > > > >       platform point of view but still understandable by user of given hw.
> > > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > > >       he/she doesn't care about where it's plugged.
> > > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > > >       provide that info from emulated hw point of view including platform specific
> > > > >       details.
> > > > >    4: it's easy to extend set of properties/values if need arises without
> > > > >       breaking users (provided user will put them all in -device/device_add
> > > > >       options as it's supposed to)
> > > > > 
> > > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > > 
> > > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > > invalidate, cache).
> > > > > 
> > > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > > CLI options and reuse cached options on the next domain starts.
> > > > > 
> > > > > This approach could be extended further with "device_add cpu" command
> > > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > > these commands and reuse them on CLI next time machine is started
> > > > > 
> > > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > > which kind of device and which 'address' should be used to attach device
> > > > > (socket|core... - for cpus, bus/function - for pic, ...)  
> > > > 
> > > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > > incrementally via the monitor has significant downsides, by making the config
> > > > invisible in any logs of the ARGV, and has likely performance impact when
> > > > starting up QEMU, particularly if it is used for more things going forward. To
> > > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > > for libvirt to use.  
> > > 
> > > Those downsides do exist, but we should weight them against the
> > > downsides of not allowing any information at all to flow from
> > > QEMU to libvirt when starting a VM.
> > > 
> > > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > > a good illustration of those downsides.  
> > 
> > Right, but for this NUMA / CPU scenario I don't think we're going to end up
> > with complexity like this. I still believe we are able to come up with a
> > way to represent it at the CLI without so much architecture specific
> > knowledge.  
> 
> In the case of NUMA/CPU, I'm inclined to agree.
Perhaps I don't see how it could be made not arch specific since
I've stared at the issue for too long, so ideas on how to it could
be made arch agnostic and still fit arch specific expectations
are welcome.


> > Even if that is not possible though, from libvirt POV the extra complexity
> > is worth it, if that is what we need to preserve fast startup time. The
> > time to start a guest is very important to apps like libguestfs and libvirt
> > sandbox, so going down a direction which is likely to add 100's or even 1000's
> > of milliseconds to the startup time is not desirable, even if it makes libvirt
> > simpler  
> 
> I don't believe this is likely to add 100's or 1000's of
> milliseconds to startup time, but I agree we need to keep an eye
> on startup time while introducing new interfaces.
In case of configuration over network delays might be arbitrary,
but it's one time cost since discovered layout could be cached in
domain config when it's defined for the first time.


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 5 months ago
On Fri, 20 Oct 2017 10:07:27 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:  
> > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:  
> > > > ----- Original Message -----  
> > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > 
> > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >   
> > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > >     
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > option
> > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > NUMA mapping for cpus.  
> > > > > > > > > 
> > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > currently
> > > > > > > > > do for NUMA configuration ?  
> > > > > > > > From RHBZ1382425
> > > > > > > > "
> > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > values for -numa cpus=... QEMU CLI option.  
> > > > > > > 
> > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > 
> > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > any other devices / object we create
> > > > > > > 
> > > > > > > ie instead of:
> > > > > > > 
> > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > 
> > > > > > > we could do:
> > > > > > > 
> > > > > > >   -object numa-node,id=numa0
> > > > > > >   -object numa-node,id=numa1
> > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > come from, currently these options are the function of
> > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > runtime after qemu parses -M and -smp options.  
> > > > > 
> > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > level  
> > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > machines with sparse IDs).
> > > > But I have to disagree here and try to oppose it.
> > > > 
> > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > and it's libvirt's domain to translate platform specific devices into
> > > > 'spherical' devices with abstract properties.
> > > > 
> > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > and their values into continuous enumeration range [0..N). That would
> > > >   1. put a burden of hiding platform/device details on QEMU
> > > >       (which is already bad as QEMU's job is to emulate it)
> > > >   2. with abstract 'address' properties and values, user won't have
> > > >      a clue as to where device is being attached (as qemu would magically
> > > >      remap that to fit specific machine needs)
> > > >   2.1. if abstract 'address' properties and values we can do away with
> > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > >      from platform point of view, so we can just drop all these nonsense
> > > >      and go back to cpu-index that has all the properties you've suggested
> > > >      /abstract, [0..N]/.
> > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > >      applicable to machines that support -device cpu, but it's up to machine
> > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > >      but current property set is open for extension if need arises without
> > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > >      values impact] doesn't scale.  
> > > 
> > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > portable. The only arch specific thing we would have todo is express constraints
> > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > threads per core for example.
> > >   
> > > > We even have cpu-add command which takes cpu-index as argument and
> > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > where and if it makes any sense from platform point of view.
> > > > 
> > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > for hot-plug:
> > > > 
> > > > Approach allows 
> > > >    1: machine to publish properties/values that make sense from emulated
> > > >       platform point of view but still understandable by user of given hw.
> > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > >       he/she doesn't care about where it's plugged.
> > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > >       provide that info from emulated hw point of view including platform specific
> > > >       details.
> > > >    4: it's easy to extend set of properties/values if need arises without
> > > >       breaking users (provided user will put them all in -device/device_add
> > > >       options as it's supposed to)
> > > > 
> > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > 
> > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > invalidate, cache).
> > > > 
> > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > CLI options and reuse cached options on the next domain starts.
> > > > 
> > > > This approach could be extended further with "device_add cpu" command
> > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > these commands and reuse them on CLI next time machine is started
> > > > 
> > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > which kind of device and which 'address' should be used to attach device
> > > > (socket|core... - for cpus, bus/function - for pic, ...)  
> > > 
> > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > incrementally via the monitor has significant downsides, by making the config
> > > invisible in any logs of the ARGV, and has likely performance impact when
> > > starting up QEMU, particularly if it is used for more things going forward. To
> > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > for libvirt to use.  
> > 
> > Those downsides do exist, but we should weight them against the
> > downsides of not allowing any information at all to flow from
> > QEMU to libvirt when starting a VM.
> > 
> > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > a good illustration of those downsides.  
> 
> Right, but for this NUMA / CPU scenario I don't think we're going to end up
> with complexity like this. I still believe we are able to come up with a
> way to represent it at the CLI without so much architecture specific
> knowledge.
Unfortunately cpu to node mapping isn't arch agnostic and requires
understanding from upper layers when they compose QEMU CLI with it.

> 
> Even if that is not possible though, from libvirt POV the extra complexity
> is worth it, if that is what we need to preserve fast startup time. The
> time to start a guest is very important to apps like libguestfs and libvirt
> sandbox, so going down a direction which is likely to add 100's or even 1000's
> of milliseconds to the startup time is not desirable, even if it makes libvirt
> simpler
Both of the above tools do not use NUMA configuration so it's not really
applicable there.

Can we cache machine layout when domain is created for the first time
and reuse cached values to the next time guest started?

> 
> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 5 months ago
On Mon, Oct 23, 2017 at 12:04:17PM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 10:07:27 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> > > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:  
> > > > > ----- Original Message -----  
> > > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > > 
> > > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >   
> > > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > > >     
> > > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > > option
> > > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > > NUMA mapping for cpus.  
> > > > > > > > > > 
> > > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > > currently
> > > > > > > > > > do for NUMA configuration ?  
> > > > > > > > > From RHBZ1382425
> > > > > > > > > "
> > > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > > values for -numa cpus=... QEMU CLI option.  
> > > > > > > > 
> > > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > > 
> > > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > > any other devices / object we create
> > > > > > > > 
> > > > > > > > ie instead of:
> > > > > > > > 
> > > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > > 
> > > > > > > > we could do:
> > > > > > > > 
> > > > > > > >   -object numa-node,id=numa0
> > > > > > > >   -object numa-node,id=numa1
> > > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > > come from, currently these options are the function of
> > > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > > runtime after qemu parses -M and -smp options.  
> > > > > > 
> > > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > > level  
> > > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > > machines with sparse IDs).
> > > > > But I have to disagree here and try to oppose it.
> > > > > 
> > > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > > and it's libvirt's domain to translate platform specific devices into
> > > > > 'spherical' devices with abstract properties.
> > > > > 
> > > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > > and their values into continuous enumeration range [0..N). That would
> > > > >   1. put a burden of hiding platform/device details on QEMU
> > > > >       (which is already bad as QEMU's job is to emulate it)
> > > > >   2. with abstract 'address' properties and values, user won't have
> > > > >      a clue as to where device is being attached (as qemu would magically
> > > > >      remap that to fit specific machine needs)
> > > > >   2.1. if abstract 'address' properties and values we can do away with
> > > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > > >      from platform point of view, so we can just drop all these nonsense
> > > > >      and go back to cpu-index that has all the properties you've suggested
> > > > >      /abstract, [0..N]/.
> > > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > > >      applicable to machines that support -device cpu, but it's up to machine
> > > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > > >      but current property set is open for extension if need arises without
> > > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > > >      values impact] doesn't scale.  
> > > > 
> > > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > > portable. The only arch specific thing we would have todo is express constraints
> > > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > > threads per core for example.
> > > >   
> > > > > We even have cpu-add command which takes cpu-index as argument and
> > > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > > where and if it makes any sense from platform point of view.
> > > > > 
> > > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > > for hot-plug:
> > > > > 
> > > > > Approach allows 
> > > > >    1: machine to publish properties/values that make sense from emulated
> > > > >       platform point of view but still understandable by user of given hw.
> > > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > > >       he/she doesn't care about where it's plugged.
> > > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > > >       provide that info from emulated hw point of view including platform specific
> > > > >       details.
> > > > >    4: it's easy to extend set of properties/values if need arises without
> > > > >       breaking users (provided user will put them all in -device/device_add
> > > > >       options as it's supposed to)
> > > > > 
> > > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > > 
> > > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > > invalidate, cache).
> > > > > 
> > > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > > CLI options and reuse cached options on the next domain starts.
> > > > > 
> > > > > This approach could be extended further with "device_add cpu" command
> > > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > > these commands and reuse them on CLI next time machine is started
> > > > > 
> > > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > > which kind of device and which 'address' should be used to attach device
> > > > > (socket|core... - for cpus, bus/function - for pic, ...)  
> > > > 
> > > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > > incrementally via the monitor has significant downsides, by making the config
> > > > invisible in any logs of the ARGV, and has likely performance impact when
> > > > starting up QEMU, particularly if it is used for more things going forward. To
> > > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > > for libvirt to use.  
> > > 
> > > Those downsides do exist, but we should weight them against the
> > > downsides of not allowing any information at all to flow from
> > > QEMU to libvirt when starting a VM.
> > > 
> > > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > > a good illustration of those downsides.  
> > 
> > Right, but for this NUMA / CPU scenario I don't think we're going to end up
> > with complexity like this. I still believe we are able to come up with a
> > way to represent it at the CLI without so much architecture specific
> > knowledge.
> Unfortunately cpu to node mapping isn't arch agnostic and requires
> understanding from upper layers when they compose QEMU CLI with it.

In terms of the guest config it manages, Libvirt doesn't care about the 
low level core-id, socket-id, thread-id values. It just knows from the
application it has to request 4 sockets, with 4 cores, with 2 threads.
Whether the cores get given core-id 0, 1, 2, 3 vs 1, 2, 16, 17 does not
matter to libvirt, nor the application using libvirt. So to avoid 
architecture differences at startup we just need to be able to configure
 the topology without referring to the arhitecture-specific integer ID 
values.

There are also some architecture constraints in respect of what combination
of sockets/core/threads are available with given CPU models. If we are to
avoid arch specific code, these constraints need to be exposed to libvirt,
which would in turn expose them to the application, to let the application
decide how to best setup the CPU topology when it creates the guest. For
this to be useful to the application it has to be provided separately from
guest startup, because eg, OpenStack decides this aspect of guest
configuration before it even decides what host to run the guest on, let
alone try to start the guest.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Paolo Bonzini 6 years, 6 months ago
On 17/10/2017 17:07, Daniel P. Berrange wrote:
> ie instead of:
> 
>   -smp 8,sockets=4,cores=2,threads=1
>   -numa node,nodeid=0,cpus=0-3
>   -numa node,nodeid=1,cpus=4-7
> 
> we could do:
> 
>   -object numa-node,id=numa0
>   -object numa-node,id=numa1
>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> 
> (perhaps -device instead of -object above, but that's a minor detail)

I understand that this is just an example, but wasn't this what is solved by

  -smp 8,sockets=4,cores=2,thread=1
  -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
  -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3

?

Paolo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Daniel P. Berrange 6 years, 6 months ago
On Wed, Oct 18, 2017 at 02:19:54PM +0200, Paolo Bonzini wrote:
> On 17/10/2017 17:07, Daniel P. Berrange wrote:
> > ie instead of:
> > 
> >   -smp 8,sockets=4,cores=2,threads=1
> >   -numa node,nodeid=0,cpus=0-3
> >   -numa node,nodeid=1,cpus=4-7
> > 
> > we could do:
> > 
> >   -object numa-node,id=numa0
> >   -object numa-node,id=numa1
> >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > 
> > (perhaps -device instead of -object above, but that's a minor detail)
> 
> I understand that this is just an example, but wasn't this what is solved by
> 
>   -smp 8,sockets=4,cores=2,thread=1
>   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
>   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3

IIUC, that lets you associate CPUs with NUMA nodes without having to know
the internal QEMU indexes. It won't help you with any monitor commands you
need to run later that expect the CPU index as input value.  My example
where lets you assign IDs to each CPU, which can then be used for montor
commands too - i should have illustrated that bit of it too.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Paolo Bonzini 6 years, 6 months ago
On 18/10/2017 14:27, Daniel P. Berrange wrote:
>>>   -object numa-node,id=numa0
>>>   -object numa-node,id=numa1
>>>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>>>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>>>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>>>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>>>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>>>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>>>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>>>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
>>>
>>> (perhaps -device instead of -object above, but that's a minor detail)
>> I understand that this is just an example, but wasn't this what is solved by
>>
>>   -smp 8,sockets=4,cores=2,thread=1
>>   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
>>   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3
> IIUC, that lets you associate CPUs with NUMA nodes without having to know
> the internal QEMU indexes. It won't help you with any monitor commands you
> need to run later that expect the CPU index as input value.  My example
> where lets you assign IDs to each CPU, which can then be used for montor
> commands too - i should have illustrated that bit of it too.

I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
returned data.

Paolo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Wed, 18 Oct 2017 14:33:49 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 18/10/2017 14:27, Daniel P. Berrange wrote:
> >>>   -object numa-node,id=numa0
> >>>   -object numa-node,id=numa1
> >>>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >>>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >>>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >>>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >>>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >>>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >>>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >>>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> >>>
> >>> (perhaps -device instead of -object above, but that's a minor detail)  
> >> I understand that this is just an example, but wasn't this what is solved by
> >>
> >>   -smp 8,sockets=4,cores=2,thread=1
> >>   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
> >>   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3  
> > IIUC, that lets you associate CPUs with NUMA nodes without having to know
> > the internal QEMU indexes. It won't help you with any monitor commands you
> > need to run later that expect the CPU index as input value.  My example
> > where lets you assign IDs to each CPU, which can then be used for montor
> > commands too - i should have illustrated that bit of it too.  
> 
> I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
> returned data.
I guess query-cpus can/does provide cpu-index already,
for query-hotpluggable-cpus it would depend in what's shown there
(would work fro x86/arm/s390 as they publish there CPUState based objects,
but spapr puts cores there which themselves do not have cpu-index,
their children do though)


> 
> Paolo
> 


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Paolo Bonzini 6 years, 6 months ago
On 18/10/2017 16:26, Igor Mammedov wrote:
>> I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
>> returned data.
> 
> I guess query-cpus can/does provide cpu-index already,
> for query-hotpluggable-cpus it would depend in what's shown there
> (would work fro x86/arm/s390 as they publish there CPUState based objects,
> but spapr puts cores there which themselves do not have cpu-index,
> their children do though)

Yeah, that's why I put "first-cpu-index".  The idea is that indices go
from first-cpu-index to first-cpu-index + vcpus-count - 1.

Paolo

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Wed, 18 Oct 2017 16:29:38 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 18/10/2017 16:26, Igor Mammedov wrote:
> >> I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
> >> returned data.  
> > 
> > I guess query-cpus can/does provide cpu-index already,
> > for query-hotpluggable-cpus it would depend in what's shown there
> > (would work fro x86/arm/s390 as they publish there CPUState based objects,
> > but spapr puts cores there which themselves do not have cpu-index,
> > their children do though)  
> 
> Yeah, that's why I put "first-cpu-index".  The idea is that indices go
> from first-cpu-index to first-cpu-index + vcpus-count - 1.
yep, so far it's so.

we can also add optional extra entries there for each
thread like this:
  
Hotpluggable CPUs:
  type: "power8_v2.0-spapr-cpu-core"
  vcpus_count: "1"
  qom_path: "/machine/unattached/device[0]"
  children threads:
           /machine/unattached/device[0]/thread[0]
           /machine/unattached/device[0]/thread[1]
  CPUInstance Properties:
    core-id: "0"

or ignore high level query-hotpluggable-cpus and use existing
query-cpus which already provides qom path to threads

and replace of cpu-index based monitor commands with qom path
based ones. (though it won't change fact that both are owned by QEMU)

> Paolo


Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Posted by Igor Mammedov 6 years, 6 months ago
On Wed, 18 Oct 2017 13:27:15 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Oct 18, 2017 at 02:19:54PM +0200, Paolo Bonzini wrote:
> > On 17/10/2017 17:07, Daniel P. Berrange wrote:  
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > 
> > > (perhaps -device instead of -object above, but that's a minor detail)  
> > 
> > I understand that this is just an example, but wasn't this what is solved by
> > 
> >   -smp 8,sockets=4,cores=2,thread=1
> >   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
> >   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3  
> 
> IIUC, that lets you associate CPUs with NUMA nodes without having to know
> the internal QEMU indexes. 
yep with -numa cpu, user don't need to know cpu_index-es anymore,
but the next thing to know for user is:
  1: set of properties machine supports
      x86: "{socket|core|thread}-id"
      spapr: "core-id"
      s390: ...
  2: what values to use for above properties,
     they might be 0..n interval but also might be sparse
     (spapr for example), in other words values owned by machine
     and might also depend on its version.

above 2 points were a reason why query-hotpluggable-cpus
had been introduced so that mgmt would query qemu for valid set
of properties/values for a given set of options so it could
compose a valid device_add command for hotplug.

I'm not opposed to adding
  -smp 0 -device foo-cpu,id=cpuX,...
as Daniel suggests but libvirt would have to implement logic that
makes up #1+#2 (which probably means duplicating it from qemu)


>It won't help you with any monitor commands you
> need to run later that expect the CPU index as input value.  My example
> where lets you assign IDs to each CPU, which can then be used for montor
> commands too - i should have illustrated that bit of it too.
monitor commands that take cpu-index is a separate not related story though.
They needs to be worked on to use socket|core|thread-id where it makes
sense.
(
For ex: spapr core can't be used as address with 'cpu' command as it
expects to thread level object stored and other commands would operate on/
expect CPUState being pointed out. Introducing explicit ID set by mgmt
won't be of use here either is it will set for core object while children
threads will be name-less.
In such usecases we can use qom path to thread of interest which
could be queried in runtime with query-cpus that gives thread level view.
)
 
> Regards,
> Daniel