[v3] pc-dimm: factor out MemoryDevice

[Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

Right now we can only map PCDIMM/NVDIMM into guest address space. In the
future, we might want to do the same for virtio devices - e.g.
virtio-pmem or virtio-mem. Especially, they should be able to live side
by side to each other.

E.g. the virto based memory devices regions will not be exposed via ACPI
and friends. They will be detected just like other virtio devices and
indicate the applicable memory region. This makes it possible to also use
them on architectures without memory device detection support (e.g. s390x).

Let's factor out the memory device code into a MemoryDevice interface.


v2 -> v3:
- "pc-dimm: factor out MemoryDevice interface"
--> Lookup both classes when comparing (David Gibson)

v1 -> v2:
- Fix compile issues on ppc (still untested  )


David Hildenbrand (3):
  pc-dimm: factor out MemoryDevice interface
  machine: make MemoryHotplugState accessible via the machine
  pc-dimm: factor out address space logic into MemoryDevice code

 hw/i386/acpi-build.c                         |   3 +-
 hw/i386/pc.c                                 |  24 ++-
 hw/mem/Makefile.objs                         |   1 +
 hw/mem/memory-device.c                       | 282 +++++++++++++++++++++++++
 hw/mem/pc-dimm.c                             | 304 +++++++--------------------
 hw/ppc/spapr.c                               |  24 ++-
 hw/ppc/spapr_hcall.c                         |   1 +
 include/hw/boards.h                          |  16 ++
 include/hw/mem/memory-device.h               |  48 +++++
 include/hw/mem/pc-dimm.h                     |  26 +--
 numa.c                                       |   3 +-
 qmp.c                                        |   4 +-
 stubs/Makefile.objs                          |   2 +-
 stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
 14 files changed, 465 insertions(+), 277 deletions(-)
 create mode 100644 hw/mem/memory-device.c
 create mode 100644 include/hw/mem/memory-device.h
 rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)

-- 
2.14.3

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by Pankaj Gupta 7 years, 9 months ago

Hello,

This is good re-factoring and needed for 'virtio-pmem' as well to
reserve memory region in system address space.

I have tested this code with virtio-pmem and its working fine. Thank you
for the work.

I just have a small suggestion : when functions like(get_addr(), get_plugged_size etc) 
in the interface are not provided by derived class, Qemu crashes.

I think having a contract for must override functions with NULL check and error
at the calling sites would be better?

Thanks,
Pankaj  

> Right now we can only map PCDIMM/NVDIMM into guest address space. In the
> future, we might want to do the same for virtio devices - e.g.
> virtio-pmem or virtio-mem. Especially, they should be able to live side
> by side to each other.
> 
> E.g. the virto based memory devices regions will not be exposed via ACPI
> and friends. They will be detected just like other virtio devices and
> indicate the applicable memory region. This makes it possible to also use
> them on architectures without memory device detection support (e.g. s390x).
> 
> Let's factor out the memory device code into a MemoryDevice interface.
> 
> 
> v2 -> v3:
> - "pc-dimm: factor out MemoryDevice interface"
> --> Lookup both classes when comparing (David Gibson)
> 
> v1 -> v2:
> - Fix compile issues on ppc (still untested  )
> 
> 
> David Hildenbrand (3):
>   pc-dimm: factor out MemoryDevice interface
>   machine: make MemoryHotplugState accessible via the machine
>   pc-dimm: factor out address space logic into MemoryDevice code
> 
>  hw/i386/acpi-build.c                         |   3 +-
>  hw/i386/pc.c                                 |  24 ++-
>  hw/mem/Makefile.objs                         |   1 +
>  hw/mem/memory-device.c                       | 282 +++++++++++++++++++++++++
>  hw/mem/pc-dimm.c                             | 304
>  +++++++--------------------
>  hw/ppc/spapr.c                               |  24 ++-
>  hw/ppc/spapr_hcall.c                         |   1 +
>  include/hw/boards.h                          |  16 ++
>  include/hw/mem/memory-device.h               |  48 +++++
>  include/hw/mem/pc-dimm.h                     |  26 +--
>  numa.c                                       |   3 +-
>  qmp.c                                        |   4 +-
>  stubs/Makefile.objs                          |   2 +-
>  stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
>  14 files changed, 465 insertions(+), 277 deletions(-)
>  create mode 100644 hw/mem/memory-device.c
>  create mode 100644 include/hw/mem/memory-device.h
>  rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)
> 
> --
> 2.14.3
> 
> 
>

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

On 22.04.2018 06:58, Pankaj Gupta wrote:
> 
> Hello,
> 
> This is good re-factoring and needed for 'virtio-pmem' as well to
> reserve memory region in system address space.
> 
> I have tested this code with virtio-pmem and its working fine. Thank you
> for the work.
> 
> I just have a small suggestion : when functions like(get_addr(), get_plugged_size etc) 
> in the interface are not provided by derived class, Qemu crashes.
> 
> I think having a contract for must override functions with NULL check and error
> at the calling sites would be better?

I expect that all of these functions are implemented. It's a contract
for devices that are mapped into address space. We might later have
additional functions that might not be required to be provided and will
be checked for NULL.

So for the current set of functions, I don't think it makes sense to
make them optional.

Thanks!

> 
> Thanks,
> Pankaj  

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by Pankaj Gupta 7 years, 9 months ago

> > 
> > Hello,
> > 
> > This is good re-factoring and needed for 'virtio-pmem' as well to
> > reserve memory region in system address space.
> > 
> > I have tested this code with virtio-pmem and its working fine. Thank you
> > for the work.
> > 
> > I just have a small suggestion : when functions like(get_addr(),
> > get_plugged_size etc)
> > in the interface are not provided by derived class, Qemu crashes.
> > 
> > I think having a contract for must override functions with NULL check and
> > error
> > at the calling sites would be better?
> 
> I expect that all of these functions are implemented. It's a contract
> for devices that are mapped into address space. We might later have
> additional functions that might not be required to be provided and will
> be checked for NULL.
> 
> So for the current set of functions, I don't think it makes sense to
> make them optional.

o.k. that's reasonable.

Thanks,
Pankaj
> 
> Thanks!
> 
> > 
> > Thanks,
> > Pankaj
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
>

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by Igor Mammedov 7 years, 9 months ago

On Fri, 20 Apr 2018 14:34:53 +0200
David Hildenbrand <david@redhat.com> wrote:

> Right now we can only map PCDIMM/NVDIMM into guest address space. In the
> future, we might want to do the same for virtio devices - e.g.
> virtio-pmem or virtio-mem. Especially, they should be able to live side
> by side to each other.
> 
> E.g. the virto based memory devices regions will not be exposed via ACPI
> and friends. They will be detected just like other virtio devices and
> indicate the applicable memory region. This makes it possible to also use
> them on architectures without memory device detection support (e.g. s390x).
> 
> Let's factor out the memory device code into a MemoryDevice interface.
A couple of high level questions as relevant code is not here:

  1. what would hotplug/unplug call chain look like in case of virtio-pmem device
     (reason I'm asking is that pmem being PCI device would trigger
      PCI bus hotplug controller and then it somehow should piggyback
      to Machine provided hotplug handlers, so I wonder what kind of
      havoc it would cause on hotplug infrastructure)

  2. why not use PCI bar mapping mechanism to do mapping since pmem is PCI device?  

 
> v2 -> v3:
> - "pc-dimm: factor out MemoryDevice interface"
> --> Lookup both classes when comparing (David Gibson)  
> 
> v1 -> v2:
> - Fix compile issues on ppc (still untested  )
> 
> 
> David Hildenbrand (3):
>   pc-dimm: factor out MemoryDevice interface
>   machine: make MemoryHotplugState accessible via the machine
>   pc-dimm: factor out address space logic into MemoryDevice code
> 
>  hw/i386/acpi-build.c                         |   3 +-
>  hw/i386/pc.c                                 |  24 ++-
>  hw/mem/Makefile.objs                         |   1 +
>  hw/mem/memory-device.c                       | 282 +++++++++++++++++++++++++
>  hw/mem/pc-dimm.c                             | 304 +++++++--------------------
>  hw/ppc/spapr.c                               |  24 ++-
>  hw/ppc/spapr_hcall.c                         |   1 +
>  include/hw/boards.h                          |  16 ++
>  include/hw/mem/memory-device.h               |  48 +++++
>  include/hw/mem/pc-dimm.h                     |  26 +--
>  numa.c                                       |   3 +-
>  qmp.c                                        |   4 +-
>  stubs/Makefile.objs                          |   2 +-
>  stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
>  14 files changed, 465 insertions(+), 277 deletions(-)
>  create mode 100644 hw/mem/memory-device.c
>  create mode 100644 include/hw/mem/memory-device.h
>  rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)
>

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

On 23.04.2018 14:31, Igor Mammedov wrote:
> On Fri, 20 Apr 2018 14:34:53 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Right now we can only map PCDIMM/NVDIMM into guest address space. In the
>> future, we might want to do the same for virtio devices - e.g.
>> virtio-pmem or virtio-mem. Especially, they should be able to live side
>> by side to each other.
>>
>> E.g. the virto based memory devices regions will not be exposed via ACPI
>> and friends. They will be detected just like other virtio devices and
>> indicate the applicable memory region. This makes it possible to also use
>> them on architectures without memory device detection support (e.g. s390x).
>>
>> Let's factor out the memory device code into a MemoryDevice interface.
> A couple of high level questions as relevant code is not here:

Important remark first: We also have s390x with virtio-ccw.

This essentially will also allow s390x guests to have
- memory hotplug via virtio-mem
- fake dax devices via virtio-pmem

> 
>   1. what would hotplug/unplug call chain look like in case of virtio-pmem device
>      (reason I'm asking is that pmem being PCI device would trigger
>       PCI bus hotplug controller and then it somehow should piggyback
>       to Machine provided hotplug handlers, so I wonder what kind of
>       havoc it would cause on hotplug infrastructure)
> 
>   2. why not use PCI bar mapping mechanism to do mapping since pmem is PCI device?  

pmem might be a PCI device, but virtio-pmem is a virtio device (however
it will be exposed via a proxy)

> 

I think both questions are best answered by Pankaj (already on CC).

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by Pankaj Gupta 7 years, 9 months ago

Hi Igor,

> 
> > Right now we can only map PCDIMM/NVDIMM into guest address space. In the
> > future, we might want to do the same for virtio devices - e.g.
> > virtio-pmem or virtio-mem. Especially, they should be able to live side
> > by side to each other.
> > 
> > E.g. the virto based memory devices regions will not be exposed via ACPI
> > and friends. They will be detected just like other virtio devices and
> > indicate the applicable memory region. This makes it possible to also use
> > them on architectures without memory device detection support (e.g. s390x).
> > 
> > Let's factor out the memory device code into a MemoryDevice interface.
> A couple of high level questions as relevant code is not here:
> 
>   1. what would hotplug/unplug call chain look like in case of virtio-pmem
>   device
>      (reason I'm asking is that pmem being PCI device would trigger
>       PCI bus hotplug controller and then it somehow should piggyback
>       to Machine provided hotplug handlers, so I wonder what kind of
>       havoc it would cause on hotplug infrastructure)

For first phase we are using 'virtio-pmem' as cold added devices. AFAIU
'VirtioDeviceClass' being parent class and 'hotplug/unplug' methods implemented 
for virtio-pmem device. So, pci bus hotplug/unplug should call the corresponding
functions?

> 
>   2. why not use PCI bar mapping mechanism to do mapping since pmem is PCI
>   device?

I think even we use if as PCI BAR mapping with PCI we still need free guest physical 
address to provide to VM for mapping the memory range. For that there needs to 
be coordination between PCDIMM and VIRTIO pci device? Also, if we use RAM from QEMU 
address space tied to big region(system_memory) memory accounting gets easier and at single place.

Honestly speaking I don't think there will be much difference between the two approaches? unless
I am missing something important?

> 
>  
> > v2 -> v3:
> > - "pc-dimm: factor out MemoryDevice interface"
> > --> Lookup both classes when comparing (David Gibson)
> > 
> > v1 -> v2:
> > - Fix compile issues on ppc (still untested  )
> > 
> > 
> > David Hildenbrand (3):
> >   pc-dimm: factor out MemoryDevice interface
> >   machine: make MemoryHotplugState accessible via the machine
> >   pc-dimm: factor out address space logic into MemoryDevice code
> > 
> >  hw/i386/acpi-build.c                         |   3 +-
> >  hw/i386/pc.c                                 |  24 ++-
> >  hw/mem/Makefile.objs                         |   1 +
> >  hw/mem/memory-device.c                       | 282
> >  +++++++++++++++++++++++++
> >  hw/mem/pc-dimm.c                             | 304
> >  +++++++--------------------
> >  hw/ppc/spapr.c                               |  24 ++-
> >  hw/ppc/spapr_hcall.c                         |   1 +
> >  include/hw/boards.h                          |  16 ++
> >  include/hw/mem/memory-device.h               |  48 +++++
> >  include/hw/mem/pc-dimm.h                     |  26 +--
> >  numa.c                                       |   3 +-
> >  qmp.c                                        |   4 +-
> >  stubs/Makefile.objs                          |   2 +-
> >  stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
> >  14 files changed, 465 insertions(+), 277 deletions(-)
> >  create mode 100644 hw/mem/memory-device.c
> >  create mode 100644 include/hw/mem/memory-device.h
> >  rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)
> > 
> 
> 
>

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

On 23.04.2018 17:32, Pankaj Gupta wrote:
> 
> Hi Igor,
> 
>>
>>> Right now we can only map PCDIMM/NVDIMM into guest address space. In the
>>> future, we might want to do the same for virtio devices - e.g.
>>> virtio-pmem or virtio-mem. Especially, they should be able to live side
>>> by side to each other.
>>>
>>> E.g. the virto based memory devices regions will not be exposed via ACPI
>>> and friends. They will be detected just like other virtio devices and
>>> indicate the applicable memory region. This makes it possible to also use
>>> them on architectures without memory device detection support (e.g. s390x).
>>>
>>> Let's factor out the memory device code into a MemoryDevice interface.
>> A couple of high level questions as relevant code is not here:
>>
>>   1. what would hotplug/unplug call chain look like in case of virtio-pmem
>>   device
>>      (reason I'm asking is that pmem being PCI device would trigger
>>       PCI bus hotplug controller and then it somehow should piggyback
>>       to Machine provided hotplug handlers, so I wonder what kind of
>>       havoc it would cause on hotplug infrastructure)
> 
> For first phase we are using 'virtio-pmem' as cold added devices. AFAIU
> 'VirtioDeviceClass' being parent class and 'hotplug/unplug' methods implemented 
> for virtio-pmem device. So, pci bus hotplug/unplug should call the corresponding
> functions?
> 
>>
>>   2. why not use PCI bar mapping mechanism to do mapping since pmem is PCI
>>   device?
> 
> I think even we use if as PCI BAR mapping with PCI we still need free guest physical 
> address to provide to VM for mapping the memory range. For that there needs to 
> be coordination between PCDIMM and VIRTIO pci device? Also, if we use RAM from QEMU 
> address space tied to big region(system_memory) memory accounting gets easier and at single place.
> 
> Honestly speaking I don't think there will be much difference between the two approaches? unless
> I am missing something important?

The difference by gluing virtio devices to architecture specific
technologies will be unnecessary complicated. (my humble opinion)

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by Igor Mammedov 7 years, 9 months ago

On Mon, 23 Apr 2018 11:32:25 -0400 (EDT)
Pankaj Gupta <pagupta@redhat.com> wrote:

> Hi Igor,
> 
> >   
> > > Right now we can only map PCDIMM/NVDIMM into guest address space. In the
> > > future, we might want to do the same for virtio devices - e.g.
> > > virtio-pmem or virtio-mem. Especially, they should be able to live side
> > > by side to each other.
> > > 
> > > E.g. the virto based memory devices regions will not be exposed via ACPI
> > > and friends. They will be detected just like other virtio devices and
> > > indicate the applicable memory region. This makes it possible to also use
> > > them on architectures without memory device detection support (e.g. s390x).
> > > 
> > > Let's factor out the memory device code into a MemoryDevice interface.  
> > A couple of high level questions as relevant code is not here:
> > 
> >   1. what would hotplug/unplug call chain look like in case of virtio-pmem
> >   device
> >      (reason I'm asking is that pmem being PCI device would trigger
> >       PCI bus hotplug controller and then it somehow should piggyback
> >       to Machine provided hotplug handlers, so I wonder what kind of
> >       havoc it would cause on hotplug infrastructure)  
> 
> For first phase we are using 'virtio-pmem' as cold added devices. AFAIU
> 'VirtioDeviceClass' being parent class and 'hotplug/unplug' methods implemented 
> for virtio-pmem device. So, pci bus hotplug/unplug should call the corresponding
> functions?
the problem is with trying to use PCI bus based device with bus-less
infrastructure used by (pc|nv)dimms.

The important point which we should not to break here while trying to glue
PCI hotplug handler with machine hotplug handler is:

container MachineState::device_memory is owned by machine and
it's up to machine plug handler (container's owner) to map device's mr
into its address space.
(i.e. nor device's realize nor PCI bus hotplug handler should do it)

Not sure about virtio-mem but if it would use device_memory container,
it should use machine's plug handler.

I don't have out head ideas how to glue it cleanly, may be
MachineState::device_memory is just not right thing to use
for such devices.

> >   2. why not use PCI bar mapping mechanism to do mapping since pmem is PCI
> >   device?  
> 
> I think even we use if as PCI BAR mapping with PCI we still need free guest physical 
> address to provide to VM for mapping the memory range. For that there needs to 
> be coordination between PCDIMM and VIRTIO pci device? Also, if we use RAM from QEMU 
> address space tied to big region(system_memory) memory accounting gets easier and at single place.
if PCI bar mapping is used, then during accounting we would need to
additionally scan system_memory for virtio-pmem device to account
for its memory (not a huge deal as we even differentiate between
pc-dimm and nvdimm when building a list of memory devices so
special casing another device types won't hurt),
at least device management (most complex part) will be governed
by respective subsystem the device belongs to.

> Honestly speaking I don't think there will be much difference between the two approaches? unless
> I am missing something important?
> 
> > 
> >    
> > > v2 -> v3:
> > > - "pc-dimm: factor out MemoryDevice interface"  
> > > --> Lookup both classes when comparing (David Gibson)  
> > > 
> > > v1 -> v2:
> > > - Fix compile issues on ppc (still untested  )
> > > 
> > > 
> > > David Hildenbrand (3):
> > >   pc-dimm: factor out MemoryDevice interface
> > >   machine: make MemoryHotplugState accessible via the machine
> > >   pc-dimm: factor out address space logic into MemoryDevice code
> > > 
> > >  hw/i386/acpi-build.c                         |   3 +-
> > >  hw/i386/pc.c                                 |  24 ++-
> > >  hw/mem/Makefile.objs                         |   1 +
> > >  hw/mem/memory-device.c                       | 282
> > >  +++++++++++++++++++++++++
> > >  hw/mem/pc-dimm.c                             | 304
> > >  +++++++--------------------
> > >  hw/ppc/spapr.c                               |  24 ++-
> > >  hw/ppc/spapr_hcall.c                         |   1 +
> > >  include/hw/boards.h                          |  16 ++
> > >  include/hw/mem/memory-device.h               |  48 +++++
> > >  include/hw/mem/pc-dimm.h                     |  26 +--
> > >  numa.c                                       |   3 +-
> > >  qmp.c                                        |   4 +-
> > >  stubs/Makefile.objs                          |   2 +-
> > >  stubs/{qmp_pc_dimm.c => qmp_memory_device.c} |   4 +-
> > >  14 files changed, 465 insertions(+), 277 deletions(-)
> > >  create mode 100644 hw/mem/memory-device.c
> > >  create mode 100644 include/hw/mem/memory-device.h
> > >  rename stubs/{qmp_pc_dimm.c => qmp_memory_device.c} (61%)
> > >   
> > 
> > 
> >

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

On 24.04.2018 16:00, Igor Mammedov wrote:
> On Mon, 23 Apr 2018 11:32:25 -0400 (EDT)
> Pankaj Gupta <pagupta@redhat.com> wrote:
> 
>> Hi Igor,
>>
>>>   
>>>> Right now we can only map PCDIMM/NVDIMM into guest address space. In the
>>>> future, we might want to do the same for virtio devices - e.g.
>>>> virtio-pmem or virtio-mem. Especially, they should be able to live side
>>>> by side to each other.
>>>>
>>>> E.g. the virto based memory devices regions will not be exposed via ACPI
>>>> and friends. They will be detected just like other virtio devices and
>>>> indicate the applicable memory region. This makes it possible to also use
>>>> them on architectures without memory device detection support (e.g. s390x).
>>>>
>>>> Let's factor out the memory device code into a MemoryDevice interface.  
>>> A couple of high level questions as relevant code is not here:
>>>
>>>   1. what would hotplug/unplug call chain look like in case of virtio-pmem
>>>   device
>>>      (reason I'm asking is that pmem being PCI device would trigger
>>>       PCI bus hotplug controller and then it somehow should piggyback
>>>       to Machine provided hotplug handlers, so I wonder what kind of
>>>       havoc it would cause on hotplug infrastructure)  
>>
>> For first phase we are using 'virtio-pmem' as cold added devices. AFAIU
>> 'VirtioDeviceClass' being parent class and 'hotplug/unplug' methods implemented 
>> for virtio-pmem device. So, pci bus hotplug/unplug should call the corresponding
>> functions?
> the problem is with trying to use PCI bus based device with bus-less
> infrastructure used by (pc|nv)dimms.

I can understand your reasoning, but for me these are some QEMU internal details
that should not stop the virtio-(p)mem train from rolling.

In my world, device hotplug is composed of the following steps

1. Resource allocation
2. Attaching the device to a bus (making it accessible by the guest)
3. Notifying the guest

I would e.g. also call ACPI sort of a bus structure. Now, the machine hotplug
handler currently does parts of 1. and then hands of to ACPI to do 2 and 3.

virtio-mem and virtio-pmem do 1. partially in the realize function and then
let 2. and 3. be handled by the proxy device specific hotplug handlers.

Mean people might say that the machine should not call the ACPI code but there
should be a ACPI hotplug handler. So we would end up with the same result.

But anyhow, the resource allocation (getting an address and getting plugged) will
be done in the first step out of the virtio-(p)mem realize function:

static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
{
   ...
   /* try to get a mapping in guest address space */
    vm->phys_addr = memory_device_get_free_addr(MACHINE(qdev_get_machine))...
    if (local_err) {
        goto out;
    }
    ...

    /* register the memory region */
    memory_device_plug_region(MACHINE(qdev_get_machine()), vm->mr,
                              vm->phys_addr);
   ...
}

So this happens before any hotplug handler is called. Everything works
just fine. What you don't like about this is the qdev_get_machine(). I
also don't like it but in the short term I don't see any problem with
it. It is resource allocation and not a "device plug" in the typical form.

> 
> The important point which we should not to break here while trying to glue
> PCI hotplug handler with machine hotplug handler is:

I could later on imagine something like a 2 step approach.

1. resource allocation handler by a machine for MemoryDevices
- assigns address, registers memory region
2. hotplug handler (ACPI, PCI, CCW ...)
- assigns bus specific stuff, attaches device, notifies guest

Importantly the device is not visible to the guest until 2.
Of course, we could also take care of pre-plug things as you mentioned.

> 
> container MachineState::device_memory is owned by machine and
> it's up to machine plug handler (container's owner) to map device's mr
> into its address space.
> (i.e. nor device's realize nor PCI bus hotplug handler should do it)

I agree, but I think these are internal details.

> 
> Not sure about virtio-mem but if it would use device_memory container,
> it should use machine's plug handler.
> 
> I don't have out head ideas how to glue it cleanly, may be
> MachineState::device_memory is just not right thing to use
> for such devices.

I strongly disagree. From the user point of view it should not matter what
was added/plugged. There is just one guest physical memory and maxmem is
defined for one QEMU instance. Exposing such details to the user should
definitely be avoided.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by Igor Mammedov 7 years, 9 months ago

On Tue, 24 Apr 2018 17:42:44 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 24.04.2018 16:00, Igor Mammedov wrote:
> > On Mon, 23 Apr 2018 11:32:25 -0400 (EDT)
> > Pankaj Gupta <pagupta@redhat.com> wrote:
> >   
> >> Hi Igor,
> >>  
> >>>     
> >>>> Right now we can only map PCDIMM/NVDIMM into guest address space. In the
> >>>> future, we might want to do the same for virtio devices - e.g.
> >>>> virtio-pmem or virtio-mem. Especially, they should be able to live side
> >>>> by side to each other.
> >>>>
> >>>> E.g. the virto based memory devices regions will not be exposed via ACPI
> >>>> and friends. They will be detected just like other virtio devices and
> >>>> indicate the applicable memory region. This makes it possible to also use
> >>>> them on architectures without memory device detection support (e.g. s390x).
> >>>>
> >>>> Let's factor out the memory device code into a MemoryDevice interface.    
> >>> A couple of high level questions as relevant code is not here:
> >>>
> >>>   1. what would hotplug/unplug call chain look like in case of virtio-pmem
> >>>   device
> >>>      (reason I'm asking is that pmem being PCI device would trigger
> >>>       PCI bus hotplug controller and then it somehow should piggyback
> >>>       to Machine provided hotplug handlers, so I wonder what kind of
> >>>       havoc it would cause on hotplug infrastructure)    
> >>
> >> For first phase we are using 'virtio-pmem' as cold added devices. AFAIU
> >> 'VirtioDeviceClass' being parent class and 'hotplug/unplug' methods implemented 
> >> for virtio-pmem device. So, pci bus hotplug/unplug should call the corresponding
> >> functions?  
> > the problem is with trying to use PCI bus based device with bus-less
> > infrastructure used by (pc|nv)dimms.  
> 
> I can understand your reasoning, but for me these are some QEMU internal details
> that should not stop the virtio-(p)mem train from rolling.
If it's quickly hacked up prototypes to play with than it's fine
as far as they are not being merged into QEMU.
If one plans to merge it, then code should be adapted to
whatever QEMU internal requirements are.

> In my world, device hotplug is composed of the following steps
> 
> 1. Resource allocation
> 2. Attaching the device to a bus (making it accessible by the guest)
> 3. Notifying the guest
> 
> I would e.g. also call ACPI sort of a bus structure. Now, the machine hotplug
> handler currently does parts of 1. and then hands of to ACPI to do 2 and 3.
it's not a bus, it's concrete device implementing GPE logic,
on x86 it does the job on notifier #3 in case of hotplug.

> virtio-mem and virtio-pmem do 1. partially in the realize function and then
> let 2. and 3. be handled by the proxy device specific hotplug handlers.
> 
> Mean people might say that the machine should not call the ACPI code but there
> should be a ACPI hotplug handler. So we would end up with the same result.
it should be fine for parent to manage its children but not other way around


> But anyhow, the resource allocation (getting an address and getting plugged) will
> be done in the first step out of the virtio-(p)mem realize function:
> 
> static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
> {
>    ...
>    /* try to get a mapping in guest address space */
>     vm->phys_addr = memory_device_get_free_addr(MACHINE(qdev_get_machine))...
this should be a property, and if it's not set then realize should error out

>     if (local_err) {
>         goto out;
>     }
>     ...
> 
>     /* register the memory region */
>     memory_device_plug_region(MACHINE(qdev_get_machine()), vm->mr,
>                               vm->phys_addr);
>    ...
> }
> 
> So this happens before any hotplug handler is called. Everything works
> just fine. What you don't like about this is the qdev_get_machine(). I
> also don't like it but in the short term I don't see any problem with
> it. It is resource allocation and not a "device plug" in the typical form.

It's not qdev_get_machine() that's issue, it's layer violation,
where child device is allocating and mapping resources of one of its parents.

that's been an issue and show stopper for patches in the past,
and that's probably not going to change in this case either.

 
> > The important point which we should not to break here while trying to glue
> > PCI hotplug handler with machine hotplug handler is:  
> 
> I could later on imagine something like a 2 step approach.
> 
> 1. resource allocation handler by a machine for MemoryDevices
> - assigns address, registers memory region
> 2. hotplug handler (ACPI, PCI, CCW ...)
> - assigns bus specific stuff, attaches device, notifies guest
>
> Importantly the device is not visible to the guest until 2.
So far it's about how QEMU models and manages wiring process,
that's why pre_plug/plug handlers were introduced, to allow
resource owner to attach devices that's plugged into it.

i.e. PCI devices are managed by PCI subsystem and DIMM
devices are managed by board where they are mapped into
reserved address space by board code that owns it.

Allowing random device to manage board resources directly
isn't really acceptable (even as temporary solution).

In case of virtio-pmem it might be much cleaner to use
mapping mechanism provided by PCI sybsytem than trying
to bridge bus and buss-less device wiring as from device
modeling point of view (aside from providing RAM to guest)
it's 2 quite different devices.

i.e. if you think new device is RAM, which is governed by
-m option, then model it as bus-less device like dimm and
plug it directly into board, if its plugged in to a bus
it's that bus owner responsibility to allocate/manage
address space or bridge it to parent device.

(btw: virtio-pmem looks sort of like ivshmem, maybe they
can share some code on qemu side)

> Of course, we could also take care of pre-plug things as you mentioned.
> 
> > 
> > container MachineState::device_memory is owned by machine and
> > it's up to machine plug handler (container's owner) to map device's mr
> > into its address space.
> > (i.e. nor device's realize nor PCI bus hotplug handler should do it)  
> 
> I agree, but I think these are internal details.
it's internal details that we choose not to violate in QEMU
and were working towards that direction, getting rid of places
that do it wrongly.

> > Not sure about virtio-mem but if it would use device_memory container,
> > it should use machine's plug handler.
> > 
> > I don't have out head ideas how to glue it cleanly, may be
> > MachineState::device_memory is just not right thing to use
> > for such devices.  
> 
> I strongly disagree. From the user point of view it should not matter what
> was added/plugged. There is just one guest physical memory and maxmem is
> defined for one QEMU instance. Exposing such details to the user should
> definitely be avoided.
qemu user have to be exposed to details as he already adds
 -device virtio-pmem,....
to CLI, maxmem accounting is a separate matter and probably
shouldn't be mixed with device model and how it's mapped into
guest's address space.

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

>>>> For first phase we are using 'virtio-pmem' as cold added devices. AFAIU
>>>> 'VirtioDeviceClass' being parent class and 'hotplug/unplug' methods implemented 
>>>> for virtio-pmem device. So, pci bus hotplug/unplug should call the corresponding
>>>> functions?  
>>> the problem is with trying to use PCI bus based device with bus-less
>>> infrastructure used by (pc|nv)dimms.  
>>
>> I can understand your reasoning, but for me these are some QEMU internal details
>> that should not stop the virtio-(p)mem train from rolling.
> If it's quickly hacked up prototypes to play with than it's fine
> as far as they are not being merged into QEMU.
> If one plans to merge it, then code should be adapted to
> whatever QEMU internal requirements are.

At one point we will have to decide if we want to develop good software
(which tolerates layer violations if there is a good excuse) or build
the perfect internal architecture. And we all know the latter is not the
case right now and never will be.

So yes, I will be looking into ways to make this work "nicer"
internally, but quite frankly, it has very little priority.

> 
>> In my world, device hotplug is composed of the following steps
>>
>> 1. Resource allocation
>> 2. Attaching the device to a bus (making it accessible by the guest)
>> 3. Notifying the guest
>>
>> I would e.g. also call ACPI sort of a bus structure. Now, the machine hotplug
>> handler currently does parts of 1. and then hands of to ACPI to do 2 and 3.
> it's not a bus, it's concrete device implementing GPE logic,
> on x86 it does the job on notifier #3 in case of hotplug.
> 
>> virtio-mem and virtio-pmem do 1. partially in the realize function and then
>> let 2. and 3. be handled by the proxy device specific hotplug handlers.
>>
>> Mean people might say that the machine should not call the ACPI code but there
>> should be a ACPI hotplug handler. So we would end up with the same result.
> it should be fine for parent to manage its children but not other way around

A virtio-bus (e.g. CCW) also "belongs" to the machine. But we won't
start to pass all device starting from the machine downwards to the
concrete implementation.

(but I get your point)

> 
> 
>> But anyhow, the resource allocation (getting an address and getting plugged) will
>> be done in the first step out of the virtio-(p)mem realize function:
>>
>> static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>> {
>>    ...
>>    /* try to get a mapping in guest address space */
>>     vm->phys_addr = memory_device_get_free_addr(MACHINE(qdev_get_machine))...
> this should be a property, and if it's not set then realize should error out

It is a property but if it is 0 we do auto-detection right now (like DIMM)

> 
>>     if (local_err) {
>>         goto out;
>>     }
>>     ...
>>
>>     /* register the memory region */
>>     memory_device_plug_region(MACHINE(qdev_get_machine()), vm->mr,
>>                               vm->phys_addr);
>>    ...
>> }
>>
>> So this happens before any hotplug handler is called. Everything works
>> just fine. What you don't like about this is the qdev_get_machine(). I
>> also don't like it but in the short term I don't see any problem with
>> it. It is resource allocation and not a "device plug" in the typical form.
> 
> It's not qdev_get_machine() that's issue, it's layer violation,
> where child device is allocating and mapping resources of one of its parents.

Quite simple: introduce a function at the machine where the child can
"request" to get an address and "request" to plug/unplug a region.

Or what would be wrong about that?

> 
> that's been an issue and show stopper for patches in the past,
> and that's probably not going to change in this case either.
> 

I can see that, but again, for me these are internal details.

>  
>>> The important point which we should not to break here while trying to glue
>>> PCI hotplug handler with machine hotplug handler is:  
>>
>> I could later on imagine something like a 2 step approach.
>>
>> 1. resource allocation handler by a machine for MemoryDevices
>> - assigns address, registers memory region
>> 2. hotplug handler (ACPI, PCI, CCW ...)
>> - assigns bus specific stuff, attaches device, notifies guest
>>
>> Importantly the device is not visible to the guest until 2.
> So far it's about how QEMU models and manages wiring process,
> that's why pre_plug/plug handlers were introduced, to allow
> resource owner to attach devices that's plugged into it.
> 
> i.e. PCI devices are managed by PCI subsystem and DIMM
> devices are managed by board where they are mapped into
> reserved address space by board code that owns it.
> 
> Allowing random device to manage board resources directly
> isn't really acceptable (even as temporary solution).

I agree to "random" devices. This should not be the design principle.

> 
> In case of virtio-pmem it might be much cleaner to use
> mapping mechanism provided by PCI sybsytem than trying
> to bridge bus and buss-less device wiring as from device
> modeling point of view (aside from providing RAM to guest)
> it's 2 quite different devices.

And again: Please don't forget virtio-ccw. We _don't_ want to glue
virtio device specifics to the underlying proxy here.

> 
> i.e. if you think new device is RAM, which is governed by
> -m option, then model it as bus-less device like dimm and
> plug it directly into board, if its plugged in to a bus
> it's that bus owner responsibility to allocate/manage
> address space or bridge it to parent device.
> 
> (btw: virtio-pmem looks sort of like ivshmem, maybe they
> can share some code on qemu side)
> 
>> Of course, we could also take care of pre-plug things as you mentioned.
>>
>>>
>>> container MachineState::device_memory is owned by machine and
>>> it's up to machine plug handler (container's owner) to map device's mr
>>> into its address space.
>>> (i.e. nor device's realize nor PCI bus hotplug handler should do it)  
>>
>> I agree, but I think these are internal details.
> it's internal details that we choose not to violate in QEMU
> and were working towards that direction, getting rid of places
> that do it wrongly.

Yes, and I'll try my best to avoid it.

> 
>>> Not sure about virtio-mem but if it would use device_memory container,
>>> it should use machine's plug handler.
>>>
>>> I don't have out head ideas how to glue it cleanly, may be
>>> MachineState::device_memory is just not right thing to use
>>> for such devices.  
>>
>> I strongly disagree. From the user point of view it should not matter what
>> was added/plugged. There is just one guest physical memory and maxmem is
>> defined for one QEMU instance. Exposing such details to the user should
>> definitely be avoided.
> qemu user have to be exposed to details as he already adds
>  -device virtio-pmem,....
> to CLI, maxmem accounting is a separate matter and probably
> shouldn't be mixed with device model and how it's mapped into
> guest's address space.

I can't follow. Please step back and have a look at how it works on the
qemu command line:

1. You specify a maxmem option
2. You plug DIMM/NVDIMM/virtio-mem/virtio-pmem

Some machines (e.g. s390x) use maxmem to setup the maximum possible
guest address space in KVM.

Just because DIMM/NVDIMM was the first user does not mean that it is the
only valid user. That is also the reason why it is named
"query-memory-devices" and not "query-dimm-devices". The abstraction is
there for a reason.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v3 0/3] pc-dimm: factor out MemoryDevice

Posted by David Hildenbrand 7 years, 9 months ago

>>> So this happens before any hotplug handler is called. Everything works
>>> just fine. What you don't like about this is the qdev_get_machine(). I
>>> also don't like it but in the short term I don't see any problem with
>>> it. It is resource allocation and not a "device plug" in the typical form.
>>
>> It's not qdev_get_machine() that's issue, it's layer violation,
>> where child device is allocating and mapping resources of one of its parents.
> 
> Quite simple: introduce a function at the machine where the child can
> "request" to get an address and "request" to plug/unplug a region.
> 
> Or what would be wrong about that?
> 

To extend on such an idea (looking at virtio_bus_device_plugged()
getting called from virtio realize functions)

Make the machine implement some new interface "MemoryDeviceManager".
From virtio-mem/pmem realize, call a function like memory_device_plugged().

Lookup the machine (qdev_get_machine() or walk all the way up until we
find one that implements MemoryDeviceManager) and call
pre_plugged/plugged/unplugged.

We could hide that in a new device class TYPE_VIRTIO_MEMORY_DEVICE.

Opinions? Completely nonsense? :) Alternatives?

-- 

Thanks,

David / dhildenb