[v1] migration: Fix s390 regressions + migration script

[PATCH 3/7] migration: Document the effect of vmstate_info_nullptr

Posted by Fabiano Rosas 1 year, 1 month ago

The migration stream lacks magic numbers at some key points. It's easy
to mis-parse data. Unfortunately, the VMS_NULLPTR_MARKER continues
with the trend. A '0' byte is ambiguous and could be interpreted as a
valid 0x30.

It is maybe not worth trying to change this while keeping backward
compatibility, so add some words of documentation to clarify.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/vmstate-types.c    | 6 ++++++
 scripts/analyze-migration.py | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index e83bfccb9e..08ed059f87 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -339,6 +339,12 @@ static int put_nullptr(QEMUFile *f, void *pv, size_t size,
 
 const VMStateInfo vmstate_info_nullptr = {
     .name = "uint64",
+
+     /*
+      * Ideally these would actually read/write the size of a pointer,
+      * but we're stuck with just a byte now for backward
+      * compatibility.
+      */
     .get  = get_nullptr,
     .put  = put_nullptr,
 };
diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index f2457b1dde..4292fde424 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -388,12 +388,21 @@ def read(self):
         return self.data
 
 class VMSDFieldUInt(VMSDFieldInt):
+    NULL_PTR_MARKER = 0x30
+
     def __init__(self, desc, file):
         super(VMSDFieldUInt, self).__init__(desc, file)
 
     def read(self):
         super(VMSDFieldUInt, self).read()
         self.data = self.udata
+
+        if self.data == self.NULL_PTR_MARKER:
+            # The migration stream encodes NULL pointers as '0' so any
+            # 0x30 in the stream could be a NULL. There's not much we
+            # can do without breaking backward compatibility.
+            pass
+
         return self.data
 
 class VMSDFieldIntLE(VMSDFieldInt):
-- 
2.35.3

Re: [PATCH 3/7] migration: Document the effect of vmstate_info_nullptr

Posted by Peter Xu 1 year, 1 month ago

On Tue, Jan 07, 2025 at 04:50:21PM -0300, Fabiano Rosas wrote:
> The migration stream lacks magic numbers at some key points. It's easy
> to mis-parse data. Unfortunately, the VMS_NULLPTR_MARKER continues
> with the trend. A '0' byte is ambiguous and could be interpreted as a
> valid 0x30.
> 
> It is maybe not worth trying to change this while keeping backward
> compatibility, so add some words of documentation to clarify.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/vmstate-types.c    | 6 ++++++
>  scripts/analyze-migration.py | 9 +++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
> index e83bfccb9e..08ed059f87 100644
> --- a/migration/vmstate-types.c
> +++ b/migration/vmstate-types.c
> @@ -339,6 +339,12 @@ static int put_nullptr(QEMUFile *f, void *pv, size_t size,
>  
>  const VMStateInfo vmstate_info_nullptr = {
>      .name = "uint64",

Ouch.. So I overlooked this line and this explains why it didn't go via
VMSDFieldGeneric already.

Instead of below comment, do we still have chance to change this to
something like "uint8"?  Then I suppose the script will be able to identify
this properly.

> +
> +     /*
> +      * Ideally these would actually read/write the size of a pointer,
> +      * but we're stuck with just a byte now for backward
> +      * compatibility.
> +      */
>      .get  = get_nullptr,
>      .put  = put_nullptr,
>  };
> diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
> index f2457b1dde..4292fde424 100755
> --- a/scripts/analyze-migration.py
> +++ b/scripts/analyze-migration.py
> @@ -388,12 +388,21 @@ def read(self):
>          return self.data
>  
>  class VMSDFieldUInt(VMSDFieldInt):
> +    NULL_PTR_MARKER = 0x30
> +
>      def __init__(self, desc, file):
>          super(VMSDFieldUInt, self).__init__(desc, file)
>  
>      def read(self):
>          super(VMSDFieldUInt, self).read()
>          self.data = self.udata
> +
> +        if self.data == self.NULL_PTR_MARKER:
> +            # The migration stream encodes NULL pointers as '0' so any
> +            # 0x30 in the stream could be a NULL. There's not much we
> +            # can do without breaking backward compatibility.
> +            pass

So this change doesn't do anything, right?

It'll be weird here having it "uint64" but the super().read() will actually
only read 1 byte..  I assume the oneliner change of s/uint64/uint8/ could
be a replacement of this patch, and I hope that'll work too for the script.
So we will still see a bunch of 0x30s but I assume it's ok.

> +
>          return self.data
>  
>  class VMSDFieldIntLE(VMSDFieldInt):
> -- 
> 2.35.3
> 

-- 
Peter Xu

Re: [PATCH 3/7] migration: Document the effect of vmstate_info_nullptr

Posted by Fabiano Rosas 1 year, 1 month ago

Peter Xu <peterx@redhat.com> writes:

> On Tue, Jan 07, 2025 at 04:50:21PM -0300, Fabiano Rosas wrote:
>> The migration stream lacks magic numbers at some key points. It's easy
>> to mis-parse data. Unfortunately, the VMS_NULLPTR_MARKER continues
>> with the trend. A '0' byte is ambiguous and could be interpreted as a
>> valid 0x30.
>> 
>> It is maybe not worth trying to change this while keeping backward
>> compatibility, so add some words of documentation to clarify.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/vmstate-types.c    | 6 ++++++
>>  scripts/analyze-migration.py | 9 +++++++++
>>  2 files changed, 15 insertions(+)
>> 
>> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
>> index e83bfccb9e..08ed059f87 100644
>> --- a/migration/vmstate-types.c
>> +++ b/migration/vmstate-types.c
>> @@ -339,6 +339,12 @@ static int put_nullptr(QEMUFile *f, void *pv, size_t size,
>>  
>>  const VMStateInfo vmstate_info_nullptr = {
>>      .name = "uint64",
>
> Ouch.. So I overlooked this line and this explains why it didn't go via
> VMSDFieldGeneric already.

Yes, actually I overlooked as well that it should match the size of the
data being handled in the get/put functions.

My comment below is about NULL -> 0x30 that I think should instead be
NULL -> 0x3030303030303030 so we have any chance of looking at this and
identifying it's a NULL pointer. When we write 0x30 it might become
confusing for people reading the scripts output that their stream has a
bunch of '0' in the place where pointers should be. If the MAGIC number
were more identifiable, I could change the script to output (null) or 0x0ULL.

We also don't really have the concept of a pointer, which I suspect
might be the real reason behind all this mess. So we'll see:

0x30
0x30
{
  .some
  .struct
  .here
}
0x30

So all this patch was trying to do is document this situation somehow.

>
> Instead of below comment, do we still have chance to change this to
> something like "uint8"?  Then I suppose the script will be able to identify
> this properly.
>
>> +
>> +     /*
>> +      * Ideally these would actually read/write the size of a pointer,
>> +      * but we're stuck with just a byte now for backward
>> +      * compatibility.
>> +      */
>>      .get  = get_nullptr,
>>      .put  = put_nullptr,
>>  };
>> diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
>> index f2457b1dde..4292fde424 100755
>> --- a/scripts/analyze-migration.py
>> +++ b/scripts/analyze-migration.py
>> @@ -388,12 +388,21 @@ def read(self):
>>          return self.data
>>  
>>  class VMSDFieldUInt(VMSDFieldInt):
>> +    NULL_PTR_MARKER = 0x30
>> +
>>      def __init__(self, desc, file):
>>          super(VMSDFieldUInt, self).__init__(desc, file)
>>  
>>      def read(self):
>>          super(VMSDFieldUInt, self).read()
>>          self.data = self.udata
>> +
>> +        if self.data == self.NULL_PTR_MARKER:
>> +            # The migration stream encodes NULL pointers as '0' so any
>> +            # 0x30 in the stream could be a NULL. There's not much we
>> +            # can do without breaking backward compatibility.
>> +            pass
>
> So this change doesn't do anything, right?
>
> It'll be weird here having it "uint64" but the super().read() will actually
> only read 1 byte..  I assume the oneliner change of s/uint64/uint8/ could
> be a replacement of this patch, and I hope that'll work too for the script.
> So we will still see a bunch of 0x30s but I assume it's ok.
>
>> +
>>          return self.data
>>  
>>  class VMSDFieldIntLE(VMSDFieldInt):
>> -- 
>> 2.35.3
>>

Re: [PATCH 3/7] migration: Document the effect of vmstate_info_nullptr

Posted by Peter Xu 1 year, 1 month ago

On Wed, Jan 08, 2025 at 10:31:05AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Tue, Jan 07, 2025 at 04:50:21PM -0300, Fabiano Rosas wrote:
> >> The migration stream lacks magic numbers at some key points. It's easy
> >> to mis-parse data. Unfortunately, the VMS_NULLPTR_MARKER continues
> >> with the trend. A '0' byte is ambiguous and could be interpreted as a
> >> valid 0x30.
> >> 
> >> It is maybe not worth trying to change this while keeping backward
> >> compatibility, so add some words of documentation to clarify.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  migration/vmstate-types.c    | 6 ++++++
> >>  scripts/analyze-migration.py | 9 +++++++++
> >>  2 files changed, 15 insertions(+)
> >> 
> >> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
> >> index e83bfccb9e..08ed059f87 100644
> >> --- a/migration/vmstate-types.c
> >> +++ b/migration/vmstate-types.c
> >> @@ -339,6 +339,12 @@ static int put_nullptr(QEMUFile *f, void *pv, size_t size,
> >>  
> >>  const VMStateInfo vmstate_info_nullptr = {
> >>      .name = "uint64",
> >
> > Ouch.. So I overlooked this line and this explains why it didn't go via
> > VMSDFieldGeneric already.
> 
> Yes, actually I overlooked as well that it should match the size of the
> data being handled in the get/put functions.
> 
> My comment below is about NULL -> 0x30 that I think should instead be
> NULL -> 0x3030303030303030 so we have any chance of looking at this and
> identifying it's a NULL pointer. When we write 0x30 it might become
> confusing for people reading the scripts output that their stream has a
> bunch of '0' in the place where pointers should be. If the MAGIC number
> were more identifiable, I could change the script to output (null) or 0x0ULL.

I suppose we can?  If we want, by renaming this from "uint64" to "nullptr",
then add an entry for it in Python's vmsd_field_readers.

> 
> We also don't really have the concept of a pointer, which I suspect
> might be the real reason behind all this mess. So we'll see:
> 
> 0x30
> 0x30
> {
>   .some
>   .struct
>   .here
> }
> 0x30
> 
> So all this patch was trying to do is document this situation somehow.

Yes, more docs makes sense, though just to mention it's nothing better here
to use a full size of pointer: firstly it's not possible I think as 32/64
bits have different size of pointers...

More importantly, we're not sending the pointer but a marker, in this case
the size of the real pointer doesn't really matter, IMHO.  A marker would
make sense in saving some bytes when / if the array is large and sparse.

Said that, let's try above idea, maybe it's optimal as you said the script
can show things like "nullptr" (or any better name, I think that's better
than "null" at least to show it's not a real pointer, otherwise it's weird
to see any pointer in a migration stream..).

> 
> >
> > Instead of below comment, do we still have chance to change this to
> > something like "uint8"?  Then I suppose the script will be able to identify
> > this properly.

-- 
Peter Xu

Re: [PATCH 3/7] migration: Document the effect of vmstate_info_nullptr

Posted by Fabiano Rosas 1 year, 1 month ago

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jan 08, 2025 at 10:31:05AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Tue, Jan 07, 2025 at 04:50:21PM -0300, Fabiano Rosas wrote:
>> >> The migration stream lacks magic numbers at some key points. It's easy
>> >> to mis-parse data. Unfortunately, the VMS_NULLPTR_MARKER continues
>> >> with the trend. A '0' byte is ambiguous and could be interpreted as a
>> >> valid 0x30.
>> >> 
>> >> It is maybe not worth trying to change this while keeping backward
>> >> compatibility, so add some words of documentation to clarify.
>> >> 
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> ---
>> >>  migration/vmstate-types.c    | 6 ++++++
>> >>  scripts/analyze-migration.py | 9 +++++++++
>> >>  2 files changed, 15 insertions(+)
>> >> 
>> >> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
>> >> index e83bfccb9e..08ed059f87 100644
>> >> --- a/migration/vmstate-types.c
>> >> +++ b/migration/vmstate-types.c
>> >> @@ -339,6 +339,12 @@ static int put_nullptr(QEMUFile *f, void *pv, size_t size,
>> >>  
>> >>  const VMStateInfo vmstate_info_nullptr = {
>> >>      .name = "uint64",
>> >
>> > Ouch.. So I overlooked this line and this explains why it didn't go via
>> > VMSDFieldGeneric already.
>> 
>> Yes, actually I overlooked as well that it should match the size of the
>> data being handled in the get/put functions.
>> 
>> My comment below is about NULL -> 0x30 that I think should instead be
>> NULL -> 0x3030303030303030 so we have any chance of looking at this and
>> identifying it's a NULL pointer. When we write 0x30 it might become
>> confusing for people reading the scripts output that their stream has a
>> bunch of '0' in the place where pointers should be. If the MAGIC number
>> were more identifiable, I could change the script to output (null) or 0x0ULL.
>
> I suppose we can?  If we want, by renaming this from "uint64" to "nullptr",
> then add an entry for it in Python's vmsd_field_readers.

That would be a nice alternative because it maps NULL to something, just
like the actual stream does. NULL -> '0' in the stream, NULL -> nullptr
in the JSON. I'll give it a try, thanks.

>> 
>> We also don't really have the concept of a pointer, which I suspect
>> might be the real reason behind all this mess. So we'll see:
>> 
>> 0x30
>> 0x30
>> {
>>   .some
>>   .struct
>>   .here
>> }
>> 0x30
>> 
>> So all this patch was trying to do is document this situation somehow.
>
> Yes, more docs makes sense, though just to mention it's nothing better here
> to use a full size of pointer: firstly it's not possible I think as 32/64
> bits have different size of pointers...
>
> More importantly, we're not sending the pointer but a marker, in this case
> the size of the real pointer doesn't really matter, IMHO.  A marker would
> make sense in saving some bytes when / if the array is large and sparse.

Right, it's just that a larger data type allows for a more unique
marker, which can be detected more reliably by the consumers of the
stream. The smaller data type is too ambiguous.

>
> Said that, let's try above idea, maybe it's optimal as you said the script
> can show things like "nullptr" (or any better name, I think that's better
> than "null" at least to show it's not a real pointer, otherwise it's weird
> to see any pointer in a migration stream..).

Yes, the script is just presenting the data, we can use what's more
informative.

>
>> 
>> >
>> > Instead of below comment, do we still have chance to change this to
>> > something like "uint8"?  Then I suppose the script will be able to identify
>> > this properly.

[PATCH 1/7] migration: Add more error handling to analyze-migration.py
[PATCH 2/7] migration: Remove unused argument in vmsd_desc_field_end
[PATCH 3/7] migration: Document the effect of vmstate_info_nullptr
[PATCH 4/7] migration: Fix parsing of s390 stream
[PATCH 5/7] migration: Dump correct JSON format for nullptr replacement
[PATCH 6/7] migration: Fix arrays of pointers in JSON writer
[PATCH 7/7] s390x: Fix CSS migration