[v1] Allow to enable multifd and postcopy migration together

[PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

From: Prasad Pandit <pjp@fedoraproject.org>

During migration, the precopy and the multifd channels
send magic value (4-bytes) to the destination side,
for it to identify the channel and properly establish
connection. But Postcopy channel did not send such value.

Introduce a magic value to be sent on the postcopy channel.
It helps to identify channels when both multifd and postcopy
migration are enabled together. An explicitly defined magic
value makes code easier to follow, because it is consistent
with the other channels.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/postcopy-ram.c | 7 +++++++
 migration/postcopy-ram.h | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 83f6160a36..5bc19b541c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1580,6 +1580,9 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
 
 void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 {
+    if (mis->postcopy_qemufile_dst) {
+        return;
+    }
     /*
      * The new loading channel has its own threads, so it needs to be
      * blocked too.  It's by default true, just be explicit.
@@ -1612,6 +1615,10 @@ postcopy_preempt_send_channel_done(MigrationState *s,
      * postcopy_qemufile_src to know whether it failed or not.
      */
     qemu_sem_post(&s->postcopy_qemufile_src_sem);
+
+    /* Send magic value to identify postcopy channel on the destination */
+    uint32_t magic = cpu_to_be32(POSTCOPY_MAGIC);
+    qio_channel_write_all(ioc, (char *)&magic, sizeof(magic), NULL);
 }
 
 static void
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index a6df1b2811..49e2982558 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -15,6 +15,9 @@
 
 #include "qapi/qapi-types-migration.h"
 
+/* Magic value to identify postcopy channel on the destination */
+#define POSTCOPY_MAGIC  0x55667788U
+
 /* Return true if the host supports everything we need to do postcopy-ram */
 bool postcopy_ram_supported_by_host(MigrationIncomingState *mis,
                                     Error **errp);
-- 
2.47.0

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

Prasad,

On Tue, Oct 29, 2024 at 08:39:05PM +0530, Prasad Pandit wrote:
> @@ -1612,6 +1615,10 @@ postcopy_preempt_send_channel_done(MigrationState *s,
>       * postcopy_qemufile_src to know whether it failed or not.
>       */
>      qemu_sem_post(&s->postcopy_qemufile_src_sem);
> +
> +    /* Send magic value to identify postcopy channel on the destination */
> +    uint32_t magic = cpu_to_be32(POSTCOPY_MAGIC);
> +    qio_channel_write_all(ioc, (char *)&magic, sizeof(magic), NULL);

As we discussed internally, we can't do this unconditionally.  We at least
some compat properties.  Or we need to see whether Fabiano's handshake can
simplify this, because the handshake will also re-design the channel
establishment protocol.

Thanks,

>  }
>  
>  static void
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index a6df1b2811..49e2982558 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -15,6 +15,9 @@
>  
>  #include "qapi/qapi-types-migration.h"
>  
> +/* Magic value to identify postcopy channel on the destination */
> +#define POSTCOPY_MAGIC  0x55667788U
> +
>  /* Return true if the host supports everything we need to do postcopy-ram */
>  bool postcopy_ram_supported_by_host(MigrationIncomingState *mis,
>                                      Error **errp);
> -- 
> 2.47.0
> 

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

On Fri, 1 Nov 2024 at 20:04, Peter Xu <peterx@redhat.com> wrote:
> As we discussed internally, we can't do this unconditionally.  We at least
> some compat properties.

* ie. we define a new compat property to check if postcopy sends a
magic value or not?

>  Or we need to see whether Fabiano's handshake can
> simplify this, because the handshake will also re-design the channel
> establishment protocol.

* May I know more about this handshake change? Is there a repository?
OR a page/document that describes what is being planned? Is it just
channel establishment change or there's more to it?

Thank you.
---
  - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

On Mon, Nov 04, 2024 at 06:02:33PM +0530, Prasad Pandit wrote:
> On Fri, 1 Nov 2024 at 20:04, Peter Xu <peterx@redhat.com> wrote:
> > As we discussed internally, we can't do this unconditionally.  We at least
> > some compat properties.
> 
> * ie. we define a new compat property to check if postcopy sends a
> magic value or not?
> 
> >  Or we need to see whether Fabiano's handshake can
> > simplify this, because the handshake will also re-design the channel
> > establishment protocol.
> 
> * May I know more about this handshake change? Is there a repository?
> OR a page/document that describes what is being planned? Is it just
> channel establishment change or there's more to it?

After a 2nd thought, maybe we don't need a new compat property, and this
should work even before handshake ready.

Firstly, we'll need a way to tell mgmt that the new qemu binary supports
enablement of both multifd + postcopy feature.  That can be done with a

  "features": [ "postcopy-with-multifd-precopy" ]

Flag attached to the QMP "migrate" command.

Then, I think we don't need a magic for preempt channel, because new qemu
binaries (after 7.2) have no issue on out-of-order connections between main
/ preempt channel.  Preempt channel is always connected later than main.

It means in the test logic of "which channel is which", it should be:

  - If it's a multifd channel (check multifd header match), it's a multifd
    channel, otherwise,

    - if main channel is not present yet, it must be the main channel (and
      we can double check the main channel magic), otherwise,

    - it's the preempt channel

With that, I think we can drop the new magic sent here.

Thanks,

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

On Mon, 4 Nov 2024 at 22:48, Peter Xu <peterx@redhat.com> wrote:
> Firstly, we'll need a way to tell mgmt that the new qemu binary supports
> enablement of both multifd + postcopy feature.  That can be done with a
>
>   "features": [ "postcopy-with-multifd-precopy" ]
>
> Flag attached to the QMP "migrate" command.

* IIUC, currently virsh(1)/libvirtd(8) sends the migration command to
the destination to inform it of the migration features to use, whether
to use multifd or postcopy or none etc. Based on that the destination
QEMU prepares to accept incoming VM. Not sure how/what above flag
shall benefit.

> Then, I think we don't need a magic for preempt channel, because new qemu
> binaries (after 7.2) have no issue on out-of-order connections between main
> / preempt channel.  Preempt channel is always connected later than main.
>
> It means in the test logic of "which channel is which", it should be:
>
>   - If it's a multifd channel (check multifd header match), it's a multifd
>     channel, otherwise,
>
>     - if main channel is not present yet, it must be the main channel (and
>       we can double check the main channel magic), otherwise,
>
>     - it's the preempt channel
>
> With that, I think we can drop the new magic sent here.

* Sending magic value on the postcopy channel only makes it consistent
with other channels. There's no harm in sending it. Explicitly
defining/sending the magic value is better than leaving it for the
code/reader to figure it out. Is there a compelling reason to drop it?
IMO, we should either define/send the magic value for all channels or
none. Both ways are consistent. Defining it for few and not for others
does not seem right.

Thank you.
---
  - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

On Tue, Nov 05, 2024 at 04:49:23PM +0530, Prasad Pandit wrote:
> On Mon, 4 Nov 2024 at 22:48, Peter Xu <peterx@redhat.com> wrote:
> > Firstly, we'll need a way to tell mgmt that the new qemu binary supports
> > enablement of both multifd + postcopy feature.  That can be done with a
> >
> >   "features": [ "postcopy-with-multifd-precopy" ]
> >
> > Flag attached to the QMP "migrate" command.
> 
> * IIUC, currently virsh(1)/libvirtd(8) sends the migration command to
> the destination to inform it of the migration features to use, whether
> to use multifd or postcopy or none etc. Based on that the destination
> QEMU prepares to accept incoming VM. Not sure how/what above flag
> shall benefit.

See:

https://www.qemu.org/docs/master/devel/qapi-code-gen.html

        Sometimes, the behaviour of QEMU changes compatibly, but without a
        change in the QMP syntax (usually by allowing values or operations
        that previously resulted in an error). QMP clients may still need
        to know whether the extension is available.

        For this purpose, a list of features can be specified for
        definitions, enumeration values, and struct members. Each feature
        list member can either be { 'name': STRING, '*if': COND }, or
        STRING, which is shorthand for { 'name': STRING }.

> 
> > Then, I think we don't need a magic for preempt channel, because new qemu
> > binaries (after 7.2) have no issue on out-of-order connections between main
> > / preempt channel.  Preempt channel is always connected later than main.
> >
> > It means in the test logic of "which channel is which", it should be:
> >
> >   - If it's a multifd channel (check multifd header match), it's a multifd
> >     channel, otherwise,
> >
> >     - if main channel is not present yet, it must be the main channel (and
> >       we can double check the main channel magic), otherwise,
> >
> >     - it's the preempt channel
> >
> > With that, I think we can drop the new magic sent here.
> 
> * Sending magic value on the postcopy channel only makes it consistent
> with other channels. There's no harm in sending it. Explicitly
> defining/sending the magic value is better than leaving it for the
> code/reader to figure it out. Is there a compelling reason to drop it?
> IMO, we should either define/send the magic value for all channels or
> none. Both ways are consistent. Defining it for few and not for others
> does not seem right.

It's a legacy issue as not all features are developed together, and that
was planned to be fixed together with handshake.  I think the handshake
could introduce one header on top to pair channels.

IMHO it is an overkill to add a feature now if it works even if tricky,
because it's not the 1st day it was tricky. And we're going to have another
header very soon..

Thanks,

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

On Tue, 5 Nov 2024 at 18:30, Peter Xu <peterx@redhat.com> wrote:
> https://www.qemu.org/docs/master/devel/qapi-code-gen.html
>
>         Sometimes, the behaviour of QEMU changes compatibly, but without a
>         change in the QMP syntax (usually by allowing values or operations
>         that previously resulted in an error). QMP clients may still need
>         to know whether the extension is available.
>
>         For this purpose, a list of features can be specified for
>         definitions, enumeration values, and struct members. Each feature
>         list member can either be { 'name': STRING, '*if': COND }, or
>         STRING, which is shorthand for { 'name': STRING }.

* I see, okay.

> It's a legacy issue as not all features are developed together, and that
> was planned to be fixed together with handshake.  I think the handshake
> could introduce one header on top to pair channels.
>
> IMHO it is an overkill to add a feature now if it works even if tricky,
> because it's not the 1st day it was tricky. And we're going to have another
> header very soon..

* See, current (this series)  'if'  conditional in the
migration_ioc_process_incoming() function is simple as:

    if (qio_channel_has_feature(ioc,
QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { peek_magic_bytes() ... }

If we don't send magic value for the postcopy channel, then we avoid
peeking into magic bytes when postcopy is enabled, because otherwise
thread will block peeking into the magic bytes, so the 'if'
conditional becomes:

    if (migrate_multifd() && !migrate_postcopy() &&
qio_channel_has_feature (...) ) {
        peek_magic_bytes()
        ...
    } else {
       When migrate_postcopy() is true
       It'll reach here not only for the 'Postcopy' channel, but even
for the 'default' and 'multifd' channels which send the magic bytes.
Then here again we'll need to identify different channels, right?
    }

* Let's not make it so complex. Let's send the magic value for the
postcopy channel and simplify it. If 'handshake' feature is going to
redo it, so be it, what's the difference? OR maybe we can align it
with the 'handshake' feature or as part of it or something like that.

@Fabiano Rosas : may I know more about the 'handshake' feature? What
it'll do and not do?

Thank you.
---
  - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

On Wed, Nov 06, 2024 at 05:49:23PM +0530, Prasad Pandit wrote:
> On Tue, 5 Nov 2024 at 18:30, Peter Xu <peterx@redhat.com> wrote:
> > https://www.qemu.org/docs/master/devel/qapi-code-gen.html
> >
> >         Sometimes, the behaviour of QEMU changes compatibly, but without a
> >         change in the QMP syntax (usually by allowing values or operations
> >         that previously resulted in an error). QMP clients may still need
> >         to know whether the extension is available.
> >
> >         For this purpose, a list of features can be specified for
> >         definitions, enumeration values, and struct members. Each feature
> >         list member can either be { 'name': STRING, '*if': COND }, or
> >         STRING, which is shorthand for { 'name': STRING }.
> 
> * I see, okay.
> 
> > It's a legacy issue as not all features are developed together, and that
> > was planned to be fixed together with handshake.  I think the handshake
> > could introduce one header on top to pair channels.
> >
> > IMHO it is an overkill to add a feature now if it works even if tricky,
> > because it's not the 1st day it was tricky. And we're going to have another
> > header very soon..
> 
> * See, current (this series)  'if'  conditional in the
> migration_ioc_process_incoming() function is simple as:
> 
>     if (qio_channel_has_feature(ioc,
> QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { peek_magic_bytes() ... }

IIUC we can't simply change it to this.  We can do this with a compat
property and we can start sending a magic in the preempt channel, but then
the code still needs to keep working with old binaries of QEMU, so in all
cases we'll need to keep the old complexity for quite a while.

IOW, I think it may break migrations from old QEMUs when postcopy preempt
is enabled, because then the new QEMU (with your patch applied) will always
peek the byte assuming the magic is there, but old binaries don't have
those.

Handshake, in my mind, will use a totally separate path, then the hope is
we'll move to that with more machine types and finally obsolete / remove
this path.

> 
> If we don't send magic value for the postcopy channel, then we avoid
> peeking into magic bytes when postcopy is enabled, because otherwise
> thread will block peeking into the magic bytes, so the 'if'
> conditional becomes:
> 
>     if (migrate_multifd() && !migrate_postcopy() &&
> qio_channel_has_feature (...) ) {
>         peek_magic_bytes()
>         ...
>     } else {
>        When migrate_postcopy() is true
>        It'll reach here not only for the 'Postcopy' channel, but even
> for the 'default' and 'multifd' channels which send the magic bytes.
> Then here again we'll need to identify different channels, right?
>     }
> 
> * Let's not make it so complex. Let's send the magic value for the

Firstly, the complexity is there on migration, requiring it work with old
qemu binaries, bi-directional by default.  In your case you're changing
receiving side, so it's even more important, because it's common old qemu
migrates to new ones.

What I want to avoid is we introduce two flags in a short period doing the
same thing.  If we want we can merge the effort, I'll leave that to you and
Fabiano to decide, so that maybe you can work out the channel establishment
part of things.

But note again that I still think your goal of enabling multifd + postcopy
may not require that new flag yet, simply because after 7.2 qemu will
connect preempt channel later than the main channel.  I think logically
QEMU can identify which channel is which: the preempt channel must be
established in this case after both main thread and multifd threads.

Meanwhile, as mentioned above, we still need to make pre-7.2 works in
general on migration in most cases (when there's network instability it
won't work.. so that's unavoidable)..  it's indeed slightly complicated,
but hopefully could still work.

In all cases, we may want to test postcopy preempt from a 7.1 qemu to the
new qemu to keep it working when the patch is ready (and you can already
try that with current patch).

> postcopy channel and simplify it. If 'handshake' feature is going to
> redo it, so be it, what's the difference? OR maybe we can align it
> with the 'handshake' feature or as part of it or something like that.
> 
> @Fabiano Rosas : may I know more about the 'handshake' feature? What
> it'll do and not do?
> 
> Thank you.
> ---
>   - Prasad
> 

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

On Wed, 6 Nov 2024 at 21:30, Peter Xu <peterx@redhat.com> wrote:
> IIUC we can't simply change it to this.  We can do this with a compat
> property and we can start sending a magic in the preempt channel, but then
> the code still needs to keep working with old binaries of QEMU, so in all
> cases we'll need to keep the old complexity for quite a while.

* I see...sigh.

> Handshake, in my mind, will use a totally separate path, then the hope is
> we'll move to that with more machine types and finally obsolete / remove
> this path.

* I need to check how 'separate path' works.

> But note again that I still think your goal of enabling multifd + postcopy
> may not require that new flag yet, simply because after 7.2 qemu will
> connect preempt channel later than the main channel.  I think logically
> QEMU can identify which channel is which: the preempt channel must be
> established in this case after both main thread and multifd threads.

* You mean, instead of relying on magic bytes, we check if both 'main
channel' and 'multifd channels' already exist, then the incoming
connection is assumed to be for the 'postcopy preempt' channel?

Thank you.
---
  - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

On Thu, Nov 07, 2024 at 05:22:05PM +0530, Prasad Pandit wrote:
> On Wed, 6 Nov 2024 at 21:30, Peter Xu <peterx@redhat.com> wrote:
> > IIUC we can't simply change it to this.  We can do this with a compat
> > property and we can start sending a magic in the preempt channel, but then
> > the code still needs to keep working with old binaries of QEMU, so in all
> > cases we'll need to keep the old complexity for quite a while.
> 
> * I see...sigh.
> 
> > Handshake, in my mind, will use a totally separate path, then the hope is
> > we'll move to that with more machine types and finally obsolete / remove
> > this path.
> 
> * I need to check how 'separate path' works.

The plan is handshake will only happen on the main channel, and that
includes waiting all the channels to be established.  There, each channel
may need to have its own header, it could be a new handshake header that
whatever migration channel will start to establish, so it's named channels
and dest qemu handshake logic can "understand" which channel is which, and
properly assign those channels in the handshake.c logic, for example.

On src, now we kick off migration by migration_should_start_incoming()
returns true, only relying on "whether qemu have all the channels ready".
The handshake code can do more, so it checks more, then after all handshake
ready it can directly invoke migration_incoming_process() in the separate
path, to which stage it also needs to guarantee channel establishment.

We'll need to keep this path though for "if (!migrate_handshake())". 

> 
> > But note again that I still think your goal of enabling multifd + postcopy
> > may not require that new flag yet, simply because after 7.2 qemu will
> > connect preempt channel later than the main channel.  I think logically
> > QEMU can identify which channel is which: the preempt channel must be
> > established in this case after both main thread and multifd threads.
> 
> * You mean, instead of relying on magic bytes, we check if both 'main
> channel' and 'multifd channels' already exist, then the incoming
> connection is assumed to be for the 'postcopy preempt' channel?

Exactly.

So we keep the fact that we should only peek() if multifd is enabled at
least.  Then in your case even postcopy is also enabled, it won't connect
the preempt channel until very late (entry of postcopy_start()), and it'll
only connect if it receives a PONG first (migration_wait_main_channel()).
That means dest has all multifd+main channels ready otherwise no PONG will
generate.  This makes sure we'll never try to peek() on preempt channel
(which may hang) as long as it's always the latest.

I think 7.1 will keep working even if it behaves differently (it connects
preempt channel earlier, see preempt_pre_7_2), because the 1st requirement
in the new code (and also, in the old code) you will also only peek() if
multifd enabled first, while multifd cannot be enabled before with
postcopy/preempt or it was already a mess, so we can imagine old users only
enable either multifd or postcopy.

Thanks,

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Fabiano Rosas 1 year, 3 months ago

Prasad Pandit <ppandit@redhat.com> writes:

> On Tue, 5 Nov 2024 at 18:30, Peter Xu <peterx@redhat.com> wrote:
>> https://www.qemu.org/docs/master/devel/qapi-code-gen.html
>>
>>         Sometimes, the behaviour of QEMU changes compatibly, but without a
>>         change in the QMP syntax (usually by allowing values or operations
>>         that previously resulted in an error). QMP clients may still need
>>         to know whether the extension is available.
>>
>>         For this purpose, a list of features can be specified for
>>         definitions, enumeration values, and struct members. Each feature
>>         list member can either be { 'name': STRING, '*if': COND }, or
>>         STRING, which is shorthand for { 'name': STRING }.
>
> * I see, okay.
>
>> It's a legacy issue as not all features are developed together, and that
>> was planned to be fixed together with handshake.  I think the handshake
>> could introduce one header on top to pair channels.
>>
>> IMHO it is an overkill to add a feature now if it works even if tricky,
>> because it's not the 1st day it was tricky. And we're going to have another
>> header very soon..
>
> * See, current (this series)  'if'  conditional in the
> migration_ioc_process_incoming() function is simple as:
>
>     if (qio_channel_has_feature(ioc,
> QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { peek_magic_bytes() ... }
>
> If we don't send magic value for the postcopy channel, then we avoid
> peeking into magic bytes when postcopy is enabled, because otherwise
> thread will block peeking into the magic bytes, so the 'if'
> conditional becomes:
>
>     if (migrate_multifd() && !migrate_postcopy() &&
> qio_channel_has_feature (...) ) {
>         peek_magic_bytes()
>         ...
>     } else {
>        When migrate_postcopy() is true
>        It'll reach here not only for the 'Postcopy' channel, but even
> for the 'default' and 'multifd' channels which send the magic bytes.
> Then here again we'll need to identify different channels, right?
>     }
>
> * Let's not make it so complex. Let's send the magic value for the
> postcopy channel and simplify it. If 'handshake' feature is going to
> redo it, so be it, what's the difference? OR maybe we can align it
> with the 'handshake' feature or as part of it or something like that.
>
> @Fabiano Rosas : may I know more about the 'handshake' feature? What
> it'll do and not do?

What we're thinking is having an initial exchange of information between
src & dst as soon as migration starts and that would sync the
capabilities and parameters between both sides. Which would then be
followed by a channel establishment phase that would open each necessary
channel (according to caps) in order, removing the current ambiguity.

>
> Thank you.
> ---
>   - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote:
> What we're thinking is having an initial exchange of information between
> src & dst as soon as migration starts and that would sync the
> capabilities and parameters between both sides. Which would then be
> followed by a channel establishment phase that would open each necessary
> channel (according to caps) in order, removing the current ambiguity.
>

* Isn't that how it works? IIUC, libvirtd(8) sends migration command
options to the destination and based on that the destination prepares
for the multifd and/or postcopy migration. In case of 'Postcopy' the
source sends 'postcopy advise' to the destination to indicate that
postcopy might follow at the end of precopy. Also, in the discussion
above Peter mentioned that libvirtd(8) may exchange list of features
between source and destination to facilitate QMP clients.

* What is the handshake doing differently? (just trying to understand)

Thank you.
---
  - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Daniel P. Berrangé 1 year, 3 months ago

On Thu, Nov 07, 2024 at 05:35:06PM +0530, Prasad Pandit wrote:
> On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote:
> > What we're thinking is having an initial exchange of information between
> > src & dst as soon as migration starts and that would sync the
> > capabilities and parameters between both sides. Which would then be
> > followed by a channel establishment phase that would open each necessary
> > channel (according to caps) in order, removing the current ambiguity.
> >
> 
> * Isn't that how it works? IIUC, libvirtd(8) sends migration command
> options to the destination and based on that the destination prepares
> for the multifd and/or postcopy migration. In case of 'Postcopy' the
> source sends 'postcopy advise' to the destination to indicate that
> postcopy might follow at the end of precopy. Also, in the discussion
> above Peter mentioned that libvirtd(8) may exchange list of features
> between source and destination to facilitate QMP clients.
> 
> * What is the handshake doing differently? (just trying to understand)

Libvirt does what it does because it has had no other choice,
not because it was good or desirable.

This kind of handshake really does not belong in libvirt. A number
of exposed migration protocol feature knobs should be considered
private to QEMU only.

It has the very negative consequence that every time QEMU wants to
provide a new feature in migration, it needs to be plumbed up through
libvirt, and often applications above, and those 3rd party projects
need to be told when & where to use the new features. The 3rd party
developers have their own project dev priorities so may not get
around to enable the new migration features for years, if ever,
undermining the work of QEMU's migration maintainers.

As examples...

If we had QEMU self-negotiation of features 10 years ago, everywhere
would already be using multifd out of the box. QEMU would have been
able to self-negotiate use of the new "multifd" protocol, and QEMU
would be well on its way to being able to delete the old single-
threaded migration code.

Similarly post-copy would have been way easier for apps, QEMU would
auto-negotiate a channel for the post-copy async page fetching. All
migrations would be running with the post-copy feature available.
All that libvirt & apps would have needed was a API to initiate the
switch to post-copy mode.

Or the hacks QEMU has put in place where we peek at incoming data
on some channels  to identify the channel type would not exist.

TL;DR: once QEMU can self-negotiate features for migration itself,
the implementation burden for libvirt & applications is greatly
reduced. QEMU migration maintainers will control their own destiny,
able to deliver improvements to users much more quickly, be able
to delete obsolete features more quickly, and be able to make
migration *automatically* enable new features & pick the optimal
defaults on their own expert knowledge, not waitnig for 3rd parties
to pay attention years later.

Some things will still need work & decisions in libvirt & apps,
but this burden should be reduced compared over the long term.
Ultimately everyone will win.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

On Thu, Nov 07, 2024 at 12:33:17PM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 07, 2024 at 05:35:06PM +0530, Prasad Pandit wrote:
> > On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote:
> > > What we're thinking is having an initial exchange of information between
> > > src & dst as soon as migration starts and that would sync the
> > > capabilities and parameters between both sides. Which would then be
> > > followed by a channel establishment phase that would open each necessary
> > > channel (according to caps) in order, removing the current ambiguity.
> > >
> > 
> > * Isn't that how it works? IIUC, libvirtd(8) sends migration command
> > options to the destination and based on that the destination prepares
> > for the multifd and/or postcopy migration. In case of 'Postcopy' the
> > source sends 'postcopy advise' to the destination to indicate that
> > postcopy might follow at the end of precopy. Also, in the discussion
> > above Peter mentioned that libvirtd(8) may exchange list of features
> > between source and destination to facilitate QMP clients.
> > 
> > * What is the handshake doing differently? (just trying to understand)
> 
> Libvirt does what it does because it has had no other choice,
> not because it was good or desirable.
> 
> This kind of handshake really does not belong in libvirt. A number
> of exposed migration protocol feature knobs should be considered
> private to QEMU only.
> 
> It has the very negative consequence that every time QEMU wants to
> provide a new feature in migration, it needs to be plumbed up through
> libvirt, and often applications above, and those 3rd party projects
> need to be told when & where to use the new features. The 3rd party
> developers have their own project dev priorities so may not get
> around to enable the new migration features for years, if ever,
> undermining the work of QEMU's migration maintainers.
> 
> As examples...
> 
> If we had QEMU self-negotiation of features 10 years ago, everywhere
> would already be using multifd out of the box. QEMU would have been
> able to self-negotiate use of the new "multifd" protocol, and QEMU
> would be well on its way to being able to delete the old single-
> threaded migration code.
> 
> Similarly post-copy would have been way easier for apps, QEMU would
> auto-negotiate a channel for the post-copy async page fetching. All
> migrations would be running with the post-copy feature available.
> All that libvirt & apps would have needed was a API to initiate the
> switch to post-copy mode.
> 
> Or the hacks QEMU has put in place where we peek at incoming data
> on some channels  to identify the channel type would not exist.
> 
> 
> TL;DR: once QEMU can self-negotiate features for migration itself,
> the implementation burden for libvirt & applications is greatly
> reduced. QEMU migration maintainers will control their own destiny,
> able to deliver improvements to users much more quickly, be able
> to delete obsolete features more quickly, and be able to make
> migration *automatically* enable new features & pick the optimal
> defaults on their own expert knowledge, not waitnig for 3rd parties
> to pay attention years later.
> 
> Some things will still need work & decisions in libvirt & apps,
> but this burden should be reduced compared over the long term.
> Ultimately everyone will win.

I'll comment on a few examples above, which I think some of them, even if
handshake is ready, may still need mgmt layers to involve..

Multifd and postcopy are the two major features, and they, IMHO, all better
need user involvements..

Multifd needs it because it relies on the channel being able to provide >1
channels.  It means "| nc XXX > file" can stop working if we turn it on by
default silently.

We also see generic use case in containers now that when there're dedicated
cores for vcpus and "no dedicate" / "one" core setup for qemu hypervisor
threads, it means anything like multifd or even block layer iothreads can
be pure overheads comparing to one thread / one channel does the job.. in
those cases they're better disabled.

Postcopy, when enabled manually like right now, has one benefit I can think
of: when the user will 100% use postcopy, the failure can happen earlier if
dest host doesn't even support it!

So it'll generate an error upfront saying "postcopy not supported" before
migration starts, comparing to fail later at migrate_start_postcopy, then
fail that command, but maybe if so the user may not even start the
migration at all, knowing workload too busy to converge.

It's pretty common that postcopy is not supported on dest due to the fact
that people are over cautious on what userfault could do (which I kind of
agree.. it's a tricky but powerful feature), so unprivileged_userfaultfd=0
knob alone can disable it for many hosts, afaiu.. simply because kvm will
be the default accel, and qemu needs kernel fault trapping capability to
make postcopy work.. which requires unprivileged userfaults that qemu
process may not have.

Maybe the postcopy-preempt mode is the one closest to what we can enable by
default, so if it's not a capability we could auto-enalbe it with/without
handshake being there.  However there's still the tricky part where it
starts to require >1 channels, so if it's the default it could break anyone
who has a proxy after the socket, for example, iff the proxy only supports
one channel.. similar to multifd, but not completely the same.

Mapped-ram, definitely needs user involvements, because it will change the
image layout.

So.. just to say, maybe we'll still need some mgmt help to explicitly
enable many of the features after the handshake work, as the mgmt knows
better on what is the context of the VM, rather than risking qemu default
configs to fail on some existing but working setups.  Though at least
libvirt will only need to set caps/params only once for each migration.

Thanks,

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Daniel P. Berrangé 1 year, 3 months ago

On Thu, Nov 07, 2024 at 11:17:30AM -0500, Peter Xu wrote:
> On Thu, Nov 07, 2024 at 12:33:17PM +0000, Daniel P. Berrangé wrote:
> I'll comment on a few examples above, which I think some of them, even if
> handshake is ready, may still need mgmt layers to involve..
> 
> Multifd and postcopy are the two major features, and they, IMHO, all better
> need user involvements..
> 
> Multifd needs it because it relies on the channel being able to provide >1
> channels.  It means "| nc XXX > file" can stop working if we turn it on by
> default silently.

NB, my point was referring to a hypothetical alternative history,
where we had the handshake at the QEMU level from day 1. That
would neccessarily imply a bi-directional channel, so the 'nc'
use case would already have been  out of scope. That said, QEMU
could identify whether the channel it was told to use was
bi-directional or not, and thus not try to do multifd over
a non-socket transport.

So the general point still holds - if QEMU had this protocol
negotiation phase built-in, there would be more flexiblity in
introducing new features without layers above needing changes,
for every single one, just a subset.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Peter Xu 1 year, 3 months ago

On Thu, Nov 07, 2024 at 04:57:46PM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 07, 2024 at 11:17:30AM -0500, Peter Xu wrote:
> > On Thu, Nov 07, 2024 at 12:33:17PM +0000, Daniel P. Berrangé wrote:
> > I'll comment on a few examples above, which I think some of them, even if
> > handshake is ready, may still need mgmt layers to involve..
> > 
> > Multifd and postcopy are the two major features, and they, IMHO, all better
> > need user involvements..
> > 
> > Multifd needs it because it relies on the channel being able to provide >1
> > channels.  It means "| nc XXX > file" can stop working if we turn it on by
> > default silently.
> 
> NB, my point was referring to a hypothetical alternative history,
> where we had the handshake at the QEMU level from day 1. That
> would neccessarily imply a bi-directional channel, so the 'nc'
> use case would already have been  out of scope. That said, QEMU
> could identify whether the channel it was told to use was
> bi-directional or not, and thus not try to do multifd over
> a non-socket transport.

Ah, that's true.

> 
> So the general point still holds - if QEMU had this protocol
> negotiation phase built-in, there would be more flexiblity in
> introducing new features without layers above needing changes,
> for every single one, just a subset.

Yes.

Just to mention, we can already do that now without handshake, as long as
the feature has zero side effect, and as long as we don't expose it as a
migration capability (which by default all off).  In that case, we can make
the property to "on", and add "off" in machine compat properties.  IOW,
machine compat property can play part of the role as handshake, based on
the machine type must be the same when initiating QEMU on both hosts.

Thanks,

-- 
Peter Xu

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Prasad Pandit 1 year, 3 months ago

Hello Peter, Dan, Fabiano,

Thank you so much for the detailed comments, I appreciate it.

On Thu, 7 Nov 2024 at 17:41, Fabiano Rosas <farosas@suse.de> wrote:
> The handshake will be a QEMU-only feature. Libvirt will then only start
> the migration on src and QEMU will do the capabilities handling.
>
On Thu, 7 Nov 2024 at 18:03, Daniel P. Berrangé <berrange@redhat.com> wrote:
> Libvirt does what it does because it has had no other choice,
> not because it was good or desirable.
>
> This kind of handshake really does not belong in libvirt. A number
> of exposed migration protocol feature knobs should be considered
> private to QEMU only.

* Right, okay.

* So then IIUC, libvirtd(8) would invoke migration by sending (without
first checking with the destination libvirtd(8)) the migration command
to the source QEMU and in qmp_migrate() before setting up the required
connections, QEMU will add the feature negotiation (or handshake)
step, right?

> It has the very negative consequence that every time QEMU wants to
> provide a new feature in migration, it needs to be plumbed up through
> libvirt, and often applications above, and those 3rd party projects
> need to be told when & where to use the new features. The 3rd party
> developers have their own project dev priorities so may not get
> around to enable the new migration features for years, if ever,
> undermining the work of QEMU's migration maintainers.

* True. I've seen the mismatch/disconnect between QEMU features and
how libvirtd(8)/virsh(1) offers them to the end users. ex. What QEMU
calls Multifd, virsh(1) calls --parallel-connections. Features like
'postcopy-preempt-channel' are implemented in QEMU, but no way for an
end user to see/enable/disable it from virsh(1) side.

* TBH, Multifd is such a misnomer, it could have been a plain simple
--threads <N> option.
    ex: virsh migrate --threads <N>: precopy migration with <N>
threads, default <N> = 1.

   Irrespective of the number of threads, the underlying migration
mechanism/protocol remains the same. Threads only help to accelerate
the rate of data transfer through multiple connections. We don't have
to differentiate channels by sending magic values then.

* Since we are thinking about correcting past wrongs, does having a
unified migration protocol make sense? OR that is too ambitious a
thing to think about? (just checking)

* Meanwhile, @Fabiano Rosas If you have not started with the handshake
or feature negotiation in QEMU, I'd try it on my side and we can
discuss how the handshake should work. If you've already started with
it, I'd be happy to help in some way as possible.

* Are we thinking about dynamically adjusting migration features based
on their availability on both sides? Ex. say source says open 10
parallel connections, but destination can do only 5, then source
reports an error and terminates migration or continues with 5
threads/connections OR say user does not mention parallel connections,
but still we automatically start multiple threads because both ends
support it? Just checking about dynamic adjustments, because earlier
while discussing with Peter, he mentioned that we can not
enable/disable user supplied options.

Thank you.
---
  - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Fabiano Rosas 1 year, 3 months ago

Prasad Pandit <ppandit@redhat.com> writes:

> Hello Peter, Dan, Fabiano,
>
> Thank you so much for the detailed comments, I appreciate it.
>
> On Thu, 7 Nov 2024 at 17:41, Fabiano Rosas <farosas@suse.de> wrote:
>> The handshake will be a QEMU-only feature. Libvirt will then only start
>> the migration on src and QEMU will do the capabilities handling.
>>
> On Thu, 7 Nov 2024 at 18:03, Daniel P. Berrangé <berrange@redhat.com> wrote:
>> Libvirt does what it does because it has had no other choice,
>> not because it was good or desirable.
>>
>> This kind of handshake really does not belong in libvirt. A number
>> of exposed migration protocol feature knobs should be considered
>> private to QEMU only.
>
> * Right, okay.
>
> * So then IIUC, libvirtd(8) would invoke migration by sending (without
> first checking with the destination libvirtd(8)) the migration command
> to the source QEMU and in qmp_migrate() before setting up the required
> connections, QEMU will add the feature negotiation (or handshake)
> step, right?

Yes, but there are still some points to figure out, as Peter mentioned,
such as how to handle capabilities for which there is a high chance that
libvirt does actually want to control, e.g. multifd. One approach is to
just continue to allow migrate-set-caps before qmp-migrate and take
those into account during handshake as well.

>
>> It has the very negative consequence that every time QEMU wants to
>> provide a new feature in migration, it needs to be plumbed up through
>> libvirt, and often applications above, and those 3rd party projects
>> need to be told when & where to use the new features. The 3rd party
>> developers have their own project dev priorities so may not get
>> around to enable the new migration features for years, if ever,
>> undermining the work of QEMU's migration maintainers.
>
> * True. I've seen the mismatch/disconnect between QEMU features and
> how libvirtd(8)/virsh(1) offers them to the end users. ex. What QEMU
> calls Multifd, virsh(1) calls --parallel-connections. Features like
> 'postcopy-preempt-channel' are implemented in QEMU, but no way for an
> end user to see/enable/disable it from virsh(1) side.
>
> * TBH, Multifd is such a misnomer, it could have been a plain simple
> --threads <N> option.
>     ex: virsh migrate --threads <N>: precopy migration with <N>
> threads, default <N> = 1.
>
>    Irrespective of the number of threads, the underlying migration
> mechanism/protocol remains the same. Threads only help to accelerate
> the rate of data transfer through multiple connections. We don't have
> to differentiate channels by sending magic values then.
>
> * Since we are thinking about correcting past wrongs, does having a
> unified migration protocol make sense? OR that is too ambitious a
> thing to think about? (just checking)
>
> * Meanwhile, @Fabiano Rosas If you have not started with the handshake
> or feature negotiation in QEMU, I'd try it on my side and we can
> discuss how the handshake should work. If you've already started with
> it, I'd be happy to help in some way as possible.
>

I'm putting together a prototype that we can iterate on. I'll let you
know as soon as I have something I can share.

> * Are we thinking about dynamically adjusting migration features based
> on their availability on both sides? Ex. say source says open 10
> parallel connections, but destination can do only 5, then source
> reports an error and terminates migration or continues with 5
> threads/connections OR say user does not mention parallel connections,
> but still we automatically start multiple threads because both ends
> support it? Just checking about dynamic adjustments, because earlier
> while discussing with Peter, he mentioned that we can not
> enable/disable user supplied options.
>
> Thank you.
> ---
>   - Prasad

Re: [PATCH 2/5] migration/postcopy: magic value for postcopy channel

Posted by Fabiano Rosas 1 year, 3 months ago

Prasad Pandit <ppandit@redhat.com> writes:

> On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote:
>> What we're thinking is having an initial exchange of information between
>> src & dst as soon as migration starts and that would sync the
>> capabilities and parameters between both sides. Which would then be
>> followed by a channel establishment phase that would open each necessary
>> channel (according to caps) in order, removing the current ambiguity.
>>
>
> * Isn't that how it works? IIUC, libvirtd(8) sends migration command
> options to the destination and based on that the destination prepares
> for the multifd and/or postcopy migration. In case of 'Postcopy' the
> source sends 'postcopy advise' to the destination to indicate that
> postcopy might follow at the end of precopy. Also, in the discussion
> above Peter mentioned that libvirtd(8) may exchange list of features
> between source and destination to facilitate QMP clients.
>
> * What is the handshake doing differently? (just trying to understand)

The handshake will be a QEMU-only feature. Libvirt will then only start
the migration on src and QEMU will do the capabilities handling.

>
> Thank you.
> ---
>   - Prasad