[Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.

remy.noel@blade-group.com posted 1 patch 5 years, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180720081948.23644-1-remy.noel@blade-group.com
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test checkpatch passed
hw/display/vga-pci.c | 1 +
1 file changed, 1 insertion(+)
[Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by remy.noel@blade-group.com 5 years, 9 months ago
From: "Remy Noel" <remy.noel@blade-group.com>

When removing a secondary-vga device and then adding it back (or adding
an other one), qemu aborts with:
    "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".

It is caused by the vram staying registered, preventing vga replugging.

Signed-off-by: Remy Noel <remy.noel@blade-group.com>
---
 hw/display/vga-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/display/vga-pci.c b/hw/display/vga-pci.c
index e9e62eac70..1c89571e46 100644
--- a/hw/display/vga-pci.c
+++ b/hw/display/vga-pci.c
@@ -288,6 +288,7 @@ static void pci_secondary_vga_exit(PCIDevice *dev)
     VGACommonState *s = &d->vga;
 
     graphic_console_close(s->con);
+    vmstate_unregister_ram(&s->vram, DEVICE(dev));
 }
 
 static void pci_secondary_vga_init(Object *obj)
-- 
2.18.0


Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Gerd Hoffmann 5 years, 8 months ago
On Fri, Jul 20, 2018 at 10:19:48AM +0200, remy.noel@blade-group.com wrote:
> From: "Remy Noel" <remy.noel@blade-group.com>
> 
> When removing a secondary-vga device and then adding it back (or adding
> an other one), qemu aborts with:
>     "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".
> 
> It is caused by the vram staying registered, preventing vga replugging.

David?  Does that look ok?

This balances the

     vmstate_register_ram(&s->vram, s->global_vmstate ?  NULL : DEVICE(obj));

call in vga_common_init().  I'm wondering whenever the manual cleanup is
actually needed in case owner is not NULL?

thanks,
  Gerd


Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Dr. David Alan Gilbert 5 years, 8 months ago
* Gerd Hoffmann (kraxel@redhat.com) wrote:
> On Fri, Jul 20, 2018 at 10:19:48AM +0200, remy.noel@blade-group.com wrote:
> > From: "Remy Noel" <remy.noel@blade-group.com>
> > 
> > When removing a secondary-vga device and then adding it back (or adding
> > an other one), qemu aborts with:
> >     "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".
> > 
> > It is caused by the vram staying registered, preventing vga replugging.
> 
> David?  Does that look ok?
> 
> This balances the
> 
>      vmstate_register_ram(&s->vram, s->global_vmstate ?  NULL : DEVICE(obj));
> 
> call in vga_common_init().  I'm wondering whenever the manual cleanup is
> actually needed in case owner is not NULL?

I can't see anyone who is calling unregister_ram or the functions it
calls as part of generic device cleanup, so I think it IS needed
to manually do it.

Which is a bit worrying since we have vastly more register's than
unregister's.

Dave

> thanks,
>   Gerd
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Peter Maydell 5 years, 8 months ago
On 7 August 2018 at 15:57, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> * Gerd Hoffmann (kraxel@redhat.com) wrote:
>> On Fri, Jul 20, 2018 at 10:19:48AM +0200, remy.noel@blade-group.com wrote:
>> > From: "Remy Noel" <remy.noel@blade-group.com>
>> >
>> > When removing a secondary-vga device and then adding it back (or adding
>> > an other one), qemu aborts with:
>> >     "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".
>> >
>> > It is caused by the vram staying registered, preventing vga replugging.
>>
>> David?  Does that look ok?
>>
>> This balances the
>>
>>      vmstate_register_ram(&s->vram, s->global_vmstate ?  NULL : DEVICE(obj));
>>
>> call in vga_common_init().  I'm wondering whenever the manual cleanup is
>> actually needed in case owner is not NULL?
>
> I can't see anyone who is calling unregister_ram or the functions it
> calls as part of generic device cleanup, so I think it IS needed
> to manually do it.
>
> Which is a bit worrying since we have vastly more register's than
> unregister's.

Paolo suggested in an email last month that vmstate_unregister_ram()
should simply not exist, because it doesn't actually do anything useful:
https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg01125.html

(ie it was added in the first place because we'd ended up with
two identically named ramblocks, but that only happened because
a reference-counting bug meant we hadn't deleted the first one
properly before creating the second.)

So I think that the bug reported in this thread is similar:
the problem is not that we're not calling vmstate_unregister_ram(),
but that when the first instance of secondary-vga is removed
it is not correctly destroying the ramblock.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Dr. David Alan Gilbert 5 years, 8 months ago
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 7 August 2018 at 15:57, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > * Gerd Hoffmann (kraxel@redhat.com) wrote:
> >> On Fri, Jul 20, 2018 at 10:19:48AM +0200, remy.noel@blade-group.com wrote:
> >> > From: "Remy Noel" <remy.noel@blade-group.com>
> >> >
> >> > When removing a secondary-vga device and then adding it back (or adding
> >> > an other one), qemu aborts with:
> >> >     "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".
> >> >
> >> > It is caused by the vram staying registered, preventing vga replugging.
> >>
> >> David?  Does that look ok?
> >>
> >> This balances the
> >>
> >>      vmstate_register_ram(&s->vram, s->global_vmstate ?  NULL : DEVICE(obj));
> >>
> >> call in vga_common_init().  I'm wondering whenever the manual cleanup is
> >> actually needed in case owner is not NULL?
> >
> > I can't see anyone who is calling unregister_ram or the functions it
> > calls as part of generic device cleanup, so I think it IS needed
> > to manually do it.
> >
> > Which is a bit worrying since we have vastly more register's than
> > unregister's.
> 
> Paolo suggested in an email last month that vmstate_unregister_ram()
> should simply not exist, because it doesn't actually do anything useful:
> https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg01125.html
> 
> (ie it was added in the first place because we'd ended up with
> two identically named ramblocks, but that only happened because
> a reference-counting bug meant we hadn't deleted the first one
> properly before creating the second.)
> 
> So I think that the bug reported in this thread is similar:
> the problem is not that we're not calling vmstate_unregister_ram(),
> but that when the first instance of secondary-vga is removed
> it is not correctly destroying the ramblock.

Ah yes that makes more sense; I remember there was another similar bug
where a device screwed up and didn't delete it's RAM causing similar
problems.

Dave

> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Remy NOEL 5 years, 8 months ago
On 08/07/2018 05:09 PM, Dr. David Alan Gilbert wrote:

> * Peter Maydell (peter.maydell@linaro.org) wrote:
>> On 7 August 2018 at 15:57, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> * Gerd Hoffmann (kraxel@redhat.com) wrote:
>>>> On Fri, Jul 20, 2018 at 10:19:48AM +0200, remy.noel@blade-group.com wrote:
>>>>> From: "Remy Noel" <remy.noel@blade-group.com>
>>>>>
>>>>> When removing a secondary-vga device and then adding it back (or adding
>>>>> an other one), qemu aborts with:
>>>>>      "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".
>>>>>
>>>>> It is caused by the vram staying registered, preventing vga replugging.
>>>> David?  Does that look ok?
>>>>
>>>> This balances the
>>>>
>>>>       vmstate_register_ram(&s->vram, s->global_vmstate ?  NULL : DEVICE(obj));
>>>>
>>>> call in vga_common_init().  I'm wondering whenever the manual cleanup is
>>>> actually needed in case owner is not NULL?
>>> I can't see anyone who is calling unregister_ram or the functions it
>>> calls as part of generic device cleanup, so I think it IS needed
>>> to manually do it.
>>>
>>> Which is a bit worrying since we have vastly more register's than
>>> unregister's.
>> Paolo suggested in an email last month that vmstate_unregister_ram()
>> should simply not exist, because it doesn't actually do anything useful:
>> https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg01125.html
>>
>> (ie it was added in the first place because we'd ended up with
>> two identically named ramblocks, but that only happened because
>> a reference-counting bug meant we hadn't deleted the first one
>> properly before creating the second.)
>>
>> So I think that the bug reported in this thread is similar:
>> the problem is not that we're not calling vmstate_unregister_ram(),
>> but that when the first instance of secondary-vga is removed
>> it is not correctly destroying the ramblock.
> Ah yes that makes more sense; I remember there was another similar bug
> where a device screwed up and didn't delete it's RAM causing similar
> problems.
>
> Dave
Thanks for the feedback, after closer inspection, the secondary-vga 
refcount does, indeed, never reach 0.

I noticed the bug was not present in v2.12.0 and had been visible since 
93abfc88bd649de1933588bfc7175605331b3ea9 
(https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07547.html).

This patch causes the secondary-vga object to be referenced by its 
subregions (mrs) which are themselves referenced by its mmio region 
which is referenced by the device causing a reference loop.
We should probably break this loop upon exit, however, i am not sure 
whether we should deletes the subregions or delete the mmio properly.

Thanks

Remy

Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Paolo Bonzini 5 years, 8 months ago
On 11/08/2018 21:07, Remy NOEL wrote:
> On 08/07/2018 05:09 PM, Dr. David Alan Gilbert wrote:
> 
>> * Peter Maydell (peter.maydell@linaro.org) wrote:
>>> On 7 August 2018 at 15:57, Dr. David Alan Gilbert
>>> <dgilbert@redhat.com> wrote:
>>>> * Gerd Hoffmann (kraxel@redhat.com) wrote:
>>>>> On Fri, Jul 20, 2018 at 10:19:48AM +0200, remy.noel@blade-group.com
>>>>> wrote:
>>>>>> From: "Remy Noel" <remy.noel@blade-group.com>
>>>>>>
>>>>>> When removing a secondary-vga device and then adding it back (or
>>>>>> adding
>>>>>> an other one), qemu aborts with:
>>>>>>      "RAMBlock "0000:00:02.0/vga.vram" already registered, abort!".
>>>>>>
>>>>>> It is caused by the vram staying registered, preventing vga
>>>>>> replugging.
>>>>> David?  Does that look ok?
>>>>>
>>>>> This balances the
>>>>>
>>>>>       vmstate_register_ram(&s->vram, s->global_vmstate ?  NULL :
>>>>> DEVICE(obj));
>>>>>
>>>>> call in vga_common_init().  I'm wondering whenever the manual
>>>>> cleanup is
>>>>> actually needed in case owner is not NULL?
>>>> I can't see anyone who is calling unregister_ram or the functions it
>>>> calls as part of generic device cleanup, so I think it IS needed
>>>> to manually do it.
>>>>
>>>> Which is a bit worrying since we have vastly more register's than
>>>> unregister's.
>>> Paolo suggested in an email last month that vmstate_unregister_ram()
>>> should simply not exist, because it doesn't actually do anything useful:
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg01125.html
>>>
>>> (ie it was added in the first place because we'd ended up with
>>> two identically named ramblocks, but that only happened because
>>> a reference-counting bug meant we hadn't deleted the first one
>>> properly before creating the second.)
>>>
>>> So I think that the bug reported in this thread is similar:
>>> the problem is not that we're not calling vmstate_unregister_ram(),
>>> but that when the first instance of secondary-vga is removed
>>> it is not correctly destroying the ramblock.
>> Ah yes that makes more sense; I remember there was another similar bug
>> where a device screwed up and didn't delete it's RAM causing similar
>> problems.
>>
>> Dave
> Thanks for the feedback, after closer inspection, the secondary-vga
> refcount does, indeed, never reach 0.
> 
> I noticed the bug was not present in v2.12.0 and had been visible since
> 93abfc88bd649de1933588bfc7175605331b3ea9
> (https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07547.html).
> 
> This patch causes the secondary-vga object to be referenced by its
> subregions (mrs) which are themselves referenced by its mmio region
> which is referenced by the device causing a reference loop.
> We should probably break this loop upon exit, however, i am not sure
> whether we should deletes the subregions or delete the mmio properly.

I'll take a look...

Paolo


Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Gerd Hoffmann 5 years, 7 months ago
  Hi,

> > Thanks for the feedback, after closer inspection, the secondary-vga
> > refcount does, indeed, never reach 0.
> > 
> > I noticed the bug was not present in v2.12.0 and had been visible since
> > 93abfc88bd649de1933588bfc7175605331b3ea9
> > (https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07547.html).
> > 
> > This patch causes the secondary-vga object to be referenced by its
> > subregions (mrs) which are themselves referenced by its mmio region
> > which is referenced by the device causing a reference loop.
> > We should probably break this loop upon exit, however, i am not sure
> > whether we should deletes the subregions or delete the mmio properly.
> 
> I'll take a look...

Ping, any results?

I'm wondering whenever we should just revert
93abfc88bd649de1933588bfc7175605331b3ea9.

Retested hotplug with 93abfc88bd649de1933588bfc7175605331b3ea9 reverted.
Works just fine with guest kernel loaded.  Doesn't work without guest
booted (grub prompt for example), but pci hotplug requires guest
cooperation so this isn't a bug in secondary-vga ...

cheers,
  Gerd


Re: [Qemu-devel] [PATCH] secondary-vga: unregister vram on unplug.
Posted by Remy NOEL 5 years, 6 months ago
Hi

On 8/30/18 1:28 PM, Gerd Hoffmann wrote:

>> I'll take a look...
> Ping, any results?
>
> I'm wondering whenever we should just revert
> 93abfc88bd649de1933588bfc7175605331b3ea9.
>
> Retested hotplug with 93abfc88bd649de1933588bfc7175605331b3ea9 reverted.
> Works just fine with guest kernel loaded.  Doesn't work without guest
> booted (grub prompt for example), but pci hotplug requires guest
> cooperation so this isn't a bug in secondary-vga ...
>
> cheers,
>    Gerd

I finally got the time to get back on this issue.

Just deleting the 3 subregions from the mmio in the exit callback does 
the job and everything works fine. I'll re-submit the patch in a new thread.

Reverting 93abfc88bd649de1933588bfc7175605331b3ea9 indeed frees the 
object but the hanging memory regions seems to cause some problems upon 
reset::

   - After deleting a device and adding it back, i get a segfault when 
calling system_reset:

                 #0  0x00007f90611acab8 g_hash_table_iter_next 
(libglib-2.0.so.0)
                 #1  0x000055a631e7aa39 object_resolve_partial_path 
(qemu-system-x86_64)
                 #2  0x000055a631e7a9ea object_resolve_partial_path 
(qemu-system-x86_64)
                 #3  0x000055a631e7a9ea object_resolve_partial_path 
(qemu-system-x86_64)
                 #4  0x000055a631e7a9ea object_resolve_partial_path 
(qemu-system-x86_64)
                 #5  0x000055a631e7aafd object_resolve_path_type 
(qemu-system-x86_64)
                 #6  0x000055a631c29879 piix4_pm_find (qemu-system-x86_64)
                 #7  0x000055a631b2d4ed acpi_get_pm_info 
(qemu-system-x86_64)
                 #8  0x000055a631b35903 acpi_build (qemu-system-x86_64)
                 #9  0x000055a631b3618d acpi_build_update 
(qemu-system-x86_64)
                 #10 0x000055a631d27cb3 fw_cfg_select (qemu-system-x86_64)
                 #11 0x000055a631d2848c fw_cfg_comb_write 
(qemu-system-x86_64)
                 #12 0x000055a631a5ebd7 memory_region_write_accessor 
(qemu-system-x86_64)
                 #13 0x000055a631a5edec access_with_adjusted_size 
(qemu-system-x86_64)
                 #14 0x000055a631a61ac2 memory_region_dispatch_write 
(qemu-system-x86_64)
                 #15 0x000055a6319fcbf9 flatview_write_continue 
(qemu-system-x86_64)
                 #16 0x000055a6319fcd43 flatview_write (qemu-system-x86_64)
                 #17 0x000055a6319fd049 address_space_write 
(qemu-system-x86_64)
                 #18 0x000055a6319fd09a address_space_rw 
(qemu-system-x86_64)
                 #19 0x000055a631a7cd66 kvm_handle_io (qemu-system-x86_64)
                 #20 0x000055a631a7d499 kvm_cpu_exec (qemu-system-x86_64)
                 #21 0x000055a631a43cb7 qemu_kvm_cpu_thread_fn 
(qemu-system-x86_64)
                 #22 0x000055a631fbbadd qemu_thread_start 
(qemu-system-x86_64)
                 #23 0x00007f905c851a9d start_thread (libpthread.so.0)
                 #24 0x00007f905c781a43 __clone (libc.so.6)

   - When calling system_reset just after removing a device, we get a 
bunch of:

         GLib-CRITICAL **: 10:16:29.020: g_hash_table_iter_init: 
assertion 'hash_table != NULL' failed

         GLib-CRITICAL **: 10:16:29.023: g_hash_table_iter_next: 
assertion 'ri->version == ri->hash_table->version' failed


I did not inquiry those further though as the subregions removal did 
lead to a clean state.


Remy Noel