[PATCH] virtio-balloon: Fix page-poison subsection name

Dr. David Alan Gilbert (git) posted 1 patch 1 week, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210914131716.102851-1-dgilbert@redhat.com
Maintainers: David Hildenbrand <david@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
hw/virtio/virtio-balloon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Dr. David Alan Gilbert (git) 1 week, 3 days ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The subsection name for page-poison was typo'd as:

  vitio-balloon-device/page-poison

Note the missing 'r' in virtio.

When we have a machine type that enables page poison, and the guest
enables it (which needs a new kernel), things fail rather unpredictably.

The fallout from this is that most of the other subsections fail to
load, including things like the feature bits in the device, one
possible fallout is that the physical addresses of the queues
then get aligned differently and we fail with an error about
last_avail_idx being wrong.
It's not obvious to me why this doesn't produce a more obvious failure,
but virtio's vmstate loading is a bit open-coded.

Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/virtio-balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 5a69dce35d..c6962fcbfe 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
 };
 
 static const VMStateDescription vmstate_virtio_balloon_page_poison = {
-    .name = "vitio-balloon-device/page-poison",
+    .name = "virtio-balloon-device/page-poison",
     .version_id = 1,
     .minimum_version_id = 1,
     .needed = virtio_balloon_page_poison_support,
-- 
2.31.1


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Philippe Mathieu-Daudé 1 week, 3 days ago
On 9/14/21 3:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The subsection name for page-poison was typo'd as:
> 
>   vitio-balloon-device/page-poison
> 
> Note the missing 'r' in virtio.
> 
> When we have a machine type that enables page poison, and the guest
> enables it (which needs a new kernel), things fail rather unpredictably.

IIUC since v5.1 guests have 'page-poison'=true but once migrated
they become 'page-poison'=unset=false?

> The fallout from this is that most of the other subsections fail to
> load, including things like the feature bits in the device, one
> possible fallout is that the physical addresses of the queues
> then get aligned differently and we fail with an error about
> last_avail_idx being wrong.
> It's not obvious to me why this doesn't produce a more obvious failure,
> but virtio's vmstate loading is a bit open-coded.

:/

> Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 5a69dce35d..c6962fcbfe 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
>  };
>  
>  static const VMStateDescription vmstate_virtio_balloon_page_poison = {
> -    .name = "vitio-balloon-device/page-poison",
> +    .name = "virtio-balloon-device/page-poison",
>      .version_id = 1,
>      .minimum_version_id = 1,
>      .needed = virtio_balloon_page_poison_support,
> 


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by David Hildenbrand 1 week, 3 days ago
On 14.09.21 15:47, Philippe Mathieu-Daudé wrote:
> On 9/14/21 3:17 PM, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> The subsection name for page-poison was typo'd as:
>>
>>    vitio-balloon-device/page-poison
>>
>> Note the missing 'r' in virtio.
>>
>> When we have a machine type that enables page poison, and the guest
>> enables it (which needs a new kernel), things fail rather unpredictably.
> 
> IIUC since v5.1 guests have 'page-poison'=true but once migrated
> they become 'page-poison'=unset=false?

We only migrate the subsection if the guest supports the feature (-> 
newer guest kernel).

If we migrate, it's a broken migration stream. If we don't migrate, 
everything is fine.

-- 
Thanks,

David / dhildenb


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by David Hildenbrand 1 week, 3 days ago
On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The subsection name for page-poison was typo'd as:
> 
>    vitio-balloon-device/page-poison
> 
> Note the missing 'r' in virtio.
> 
> When we have a machine type that enables page poison, and the guest
> enables it (which needs a new kernel), things fail rather unpredictably.
> 
> The fallout from this is that most of the other subsections fail to
> load, including things like the feature bits in the device, one
> possible fallout is that the physical addresses of the queues
> then get aligned differently and we fail with an error about
> last_avail_idx being wrong.
> It's not obvious to me why this doesn't produce a more obvious failure,
> but virtio's vmstate loading is a bit open-coded.
> 
> Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   hw/virtio/virtio-balloon.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 5a69dce35d..c6962fcbfe 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
>   };
>   
>   static const VMStateDescription vmstate_virtio_balloon_page_poison = {
> -    .name = "vitio-balloon-device/page-poison",
> +    .name = "virtio-balloon-device/page-poison",
>       .version_id = 1,
>       .minimum_version_id = 1,
>       .needed = virtio_balloon_page_poison_support,
> 

Oh, that's very subtle. I wasn't even aware that the prefix really has 
to match the actual device ... I thought the whole idea of the prefix 
here was just to make the string unique ...

Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks!

-- 
Thanks,

David / dhildenb


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Dr. David Alan Gilbert 1 week, 3 days ago
* David Hildenbrand (david@redhat.com) wrote:
> On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The subsection name for page-poison was typo'd as:
> > 
> >    vitio-balloon-device/page-poison
> > 
> > Note the missing 'r' in virtio.
> > 
> > When we have a machine type that enables page poison, and the guest
> > enables it (which needs a new kernel), things fail rather unpredictably.
> > 
> > The fallout from this is that most of the other subsections fail to
> > load, including things like the feature bits in the device, one
> > possible fallout is that the physical addresses of the queues
> > then get aligned differently and we fail with an error about
> > last_avail_idx being wrong.
> > It's not obvious to me why this doesn't produce a more obvious failure,
> > but virtio's vmstate loading is a bit open-coded.
> > 
> > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >   hw/virtio/virtio-balloon.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 5a69dce35d..c6962fcbfe 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
> >   };
> >   static const VMStateDescription vmstate_virtio_balloon_page_poison = {
> > -    .name = "vitio-balloon-device/page-poison",
> > +    .name = "virtio-balloon-device/page-poison",
> >       .version_id = 1,
> >       .minimum_version_id = 1,
> >       .needed = virtio_balloon_page_poison_support,
> > 
> 
> Oh, that's very subtle. I wasn't even aware that the prefix really has to
> match the actual device ... I thought the whole idea of the prefix here was
> just to make the string unique ...

Subsection naming is *very* critical; the logic is something like:
  'we're loading the X device'
a subsection arrives for 'N/P'
if 'X==N' then it looks in X for subsection P.
If 'X!=N' then it assumes we've finished loading X
and P is really for an outer device that X is part of.
This is horrible.

Dave

> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Dr. David Alan Gilbert 1 week, 2 days ago
Note this also correspounds to:

https://gitlab.com/qemu-project/qemu/-/issues/485


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Daniel P. Berrangé 1 week, 3 days ago
On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
> > On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > The subsection name for page-poison was typo'd as:
> > > 
> > >    vitio-balloon-device/page-poison
> > > 
> > > Note the missing 'r' in virtio.
> > > 
> > > When we have a machine type that enables page poison, and the guest
> > > enables it (which needs a new kernel), things fail rather unpredictably.
> > > 
> > > The fallout from this is that most of the other subsections fail to
> > > load, including things like the feature bits in the device, one
> > > possible fallout is that the physical addresses of the queues
> > > then get aligned differently and we fail with an error about
> > > last_avail_idx being wrong.
> > > It's not obvious to me why this doesn't produce a more obvious failure,
> > > but virtio's vmstate loading is a bit open-coded.
> > > 
> > > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
> > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >   hw/virtio/virtio-balloon.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > > index 5a69dce35d..c6962fcbfe 100644
> > > --- a/hw/virtio/virtio-balloon.c
> > > +++ b/hw/virtio/virtio-balloon.c
> > > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
> > >   };
> > >   static const VMStateDescription vmstate_virtio_balloon_page_poison = {
> > > -    .name = "vitio-balloon-device/page-poison",
> > > +    .name = "virtio-balloon-device/page-poison",
> > >       .version_id = 1,
> > >       .minimum_version_id = 1,
> > >       .needed = virtio_balloon_page_poison_support,
> > > 
> > 
> > Oh, that's very subtle. I wasn't even aware that the prefix really has to
> > match the actual device ... I thought the whole idea of the prefix here was
> > just to make the string unique ...
> 
> Subsection naming is *very* critical; the logic is something like:
>   'we're loading the X device'
> a subsection arrives for 'N/P'
> if 'X==N' then it looks in X for subsection P.
> If 'X!=N' then it assumes we've finished loading X
> and P is really for an outer device that X is part of.
> This is horrible.

Is there value in making this more explicit via a code convention
for .name field initializers. eg instead of

   .name = "virtio-balloon-device/page-poison",

Prefer

   .name = TYPE_VIRTIO_BALLOON "/page-poison"

?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Philippe Mathieu-Daudé 1 week, 3 days ago
On 9/14/21 3:58 PM, Daniel P. Berrangé wrote:
> On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote:
>> * David Hildenbrand (david@redhat.com) wrote:
>>> On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote:
>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>
>>>> The subsection name for page-poison was typo'd as:
>>>>
>>>>    vitio-balloon-device/page-poison
>>>>
>>>> Note the missing 'r' in virtio.
>>>>
>>>> When we have a machine type that enables page poison, and the guest
>>>> enables it (which needs a new kernel), things fail rather unpredictably.
>>>>
>>>> The fallout from this is that most of the other subsections fail to
>>>> load, including things like the feature bits in the device, one
>>>> possible fallout is that the physical addresses of the queues
>>>> then get aligned differently and we fail with an error about
>>>> last_avail_idx being wrong.
>>>> It's not obvious to me why this doesn't produce a more obvious failure,
>>>> but virtio's vmstate loading is a bit open-coded.
>>>>
>>>> Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
>>>> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
>>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> ---
>>>>   hw/virtio/virtio-balloon.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>>>> index 5a69dce35d..c6962fcbfe 100644
>>>> --- a/hw/virtio/virtio-balloon.c
>>>> +++ b/hw/virtio/virtio-balloon.c
>>>> @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
>>>>   };
>>>>   static const VMStateDescription vmstate_virtio_balloon_page_poison = {
>>>> -    .name = "vitio-balloon-device/page-poison",
>>>> +    .name = "virtio-balloon-device/page-poison",
>>>>       .version_id = 1,
>>>>       .minimum_version_id = 1,
>>>>       .needed = virtio_balloon_page_poison_support,
>>>>
>>>
>>> Oh, that's very subtle. I wasn't even aware that the prefix really has to
>>> match the actual device ... I thought the whole idea of the prefix here was
>>> just to make the string unique ...
>>
>> Subsection naming is *very* critical; the logic is something like:
>>   'we're loading the X device'
>> a subsection arrives for 'N/P'
>> if 'X==N' then it looks in X for subsection P.
>> If 'X!=N' then it assumes we've finished loading X
>> and P is really for an outer device that X is part of.
>> This is horrible.
> 
> Is there value in making this more explicit via a code convention
> for .name field initializers. eg instead of
> 
>    .name = "virtio-balloon-device/page-poison",
> 
> Prefer
> 
>    .name = TYPE_VIRTIO_BALLOON "/page-poison"
> 
> ?

IIUC so far only user-creatable devices are required to have
a stable name in the TYPE definition (because CLI must stay
stable). Which is why this type is not recommended for
migration section names.


Re: [PATCH] virtio-balloon: Fix page-poison subsection name

Posted by Dr. David Alan Gilbert 1 week, 3 days ago
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > The subsection name for page-poison was typo'd as:
> > > > 
> > > >    vitio-balloon-device/page-poison
> > > > 
> > > > Note the missing 'r' in virtio.
> > > > 
> > > > When we have a machine type that enables page poison, and the guest
> > > > enables it (which needs a new kernel), things fail rather unpredictably.
> > > > 
> > > > The fallout from this is that most of the other subsections fail to
> > > > load, including things like the feature bits in the device, one
> > > > possible fallout is that the physical addresses of the queues
> > > > then get aligned differently and we fail with an error about
> > > > last_avail_idx being wrong.
> > > > It's not obvious to me why this doesn't produce a more obvious failure,
> > > > but virtio's vmstate loading is a bit open-coded.
> > > > 
> > > > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature")
> > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >   hw/virtio/virtio-balloon.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > > > index 5a69dce35d..c6962fcbfe 100644
> > > > --- a/hw/virtio/virtio-balloon.c
> > > > +++ b/hw/virtio/virtio-balloon.c
> > > > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = {
> > > >   };
> > > >   static const VMStateDescription vmstate_virtio_balloon_page_poison = {
> > > > -    .name = "vitio-balloon-device/page-poison",
> > > > +    .name = "virtio-balloon-device/page-poison",
> > > >       .version_id = 1,
> > > >       .minimum_version_id = 1,
> > > >       .needed = virtio_balloon_page_poison_support,
> > > > 
> > > 
> > > Oh, that's very subtle. I wasn't even aware that the prefix really has to
> > > match the actual device ... I thought the whole idea of the prefix here was
> > > just to make the string unique ...
> > 
> > Subsection naming is *very* critical; the logic is something like:
> >   'we're loading the X device'
> > a subsection arrives for 'N/P'
> > if 'X==N' then it looks in X for subsection P.
> > If 'X!=N' then it assumes we've finished loading X
> > and P is really for an outer device that X is part of.
> > This is horrible.
> 
> Is there value in making this more explicit via a code convention
> for .name field initializers. eg instead of
> 
>    .name = "virtio-balloon-device/page-poison",
> 
> Prefer
> 
>    .name = TYPE_VIRTIO_BALLOON "/page-poison"
> 
> ?

I think it might be better to see if the vmstate code can check it and
error during saving; certainly this case feels detectable, but I need
to keep an eye open for all the other weird cases.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK