From: Prasad Pandit <pjp@fedoraproject.org>
During migration, the precopy and the multifd channels
send magic value (4-bytes) to the destination side,
for it to identify the channel and properly establish
connection. But Postcopy channel did not send such value.
Introduce a magic value to be sent on the postcopy channel.
It helps to identify channels when both multifd and postcopy
migration are enabled together. An explicitly defined magic
value makes code easier to follow, because it is consistent
with the other channels.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
migration/postcopy-ram.c | 7 +++++++
migration/postcopy-ram.h | 3 +++
2 files changed, 10 insertions(+)
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 83f6160a36..5bc19b541c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1580,6 +1580,9 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
{
+ if (mis->postcopy_qemufile_dst) {
+ return;
+ }
/*
* The new loading channel has its own threads, so it needs to be
* blocked too. It's by default true, just be explicit.
@@ -1612,6 +1615,10 @@ postcopy_preempt_send_channel_done(MigrationState *s,
* postcopy_qemufile_src to know whether it failed or not.
*/
qemu_sem_post(&s->postcopy_qemufile_src_sem);
+
+ /* Send magic value to identify postcopy channel on the destination */
+ uint32_t magic = cpu_to_be32(POSTCOPY_MAGIC);
+ qio_channel_write_all(ioc, (char *)&magic, sizeof(magic), NULL);
}
static void
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index a6df1b2811..49e2982558 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -15,6 +15,9 @@
#include "qapi/qapi-types-migration.h"
+/* Magic value to identify postcopy channel on the destination */
+#define POSTCOPY_MAGIC 0x55667788U
+
/* Return true if the host supports everything we need to do postcopy-ram */
bool postcopy_ram_supported_by_host(MigrationIncomingState *mis,
Error **errp);
--
2.47.0
Prasad, On Tue, Oct 29, 2024 at 08:39:05PM +0530, Prasad Pandit wrote: > @@ -1612,6 +1615,10 @@ postcopy_preempt_send_channel_done(MigrationState *s, > * postcopy_qemufile_src to know whether it failed or not. > */ > qemu_sem_post(&s->postcopy_qemufile_src_sem); > + > + /* Send magic value to identify postcopy channel on the destination */ > + uint32_t magic = cpu_to_be32(POSTCOPY_MAGIC); > + qio_channel_write_all(ioc, (char *)&magic, sizeof(magic), NULL); As we discussed internally, we can't do this unconditionally. We at least some compat properties. Or we need to see whether Fabiano's handshake can simplify this, because the handshake will also re-design the channel establishment protocol. Thanks, > } > > static void > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h > index a6df1b2811..49e2982558 100644 > --- a/migration/postcopy-ram.h > +++ b/migration/postcopy-ram.h > @@ -15,6 +15,9 @@ > > #include "qapi/qapi-types-migration.h" > > +/* Magic value to identify postcopy channel on the destination */ > +#define POSTCOPY_MAGIC 0x55667788U > + > /* Return true if the host supports everything we need to do postcopy-ram */ > bool postcopy_ram_supported_by_host(MigrationIncomingState *mis, > Error **errp); > -- > 2.47.0 > -- Peter Xu
On Fri, 1 Nov 2024 at 20:04, Peter Xu <peterx@redhat.com> wrote: > As we discussed internally, we can't do this unconditionally. We at least > some compat properties. * ie. we define a new compat property to check if postcopy sends a magic value or not? > Or we need to see whether Fabiano's handshake can > simplify this, because the handshake will also re-design the channel > establishment protocol. * May I know more about this handshake change? Is there a repository? OR a page/document that describes what is being planned? Is it just channel establishment change or there's more to it? Thank you. --- - Prasad
On Mon, Nov 04, 2024 at 06:02:33PM +0530, Prasad Pandit wrote: > On Fri, 1 Nov 2024 at 20:04, Peter Xu <peterx@redhat.com> wrote: > > As we discussed internally, we can't do this unconditionally. We at least > > some compat properties. > > * ie. we define a new compat property to check if postcopy sends a > magic value or not? > > > Or we need to see whether Fabiano's handshake can > > simplify this, because the handshake will also re-design the channel > > establishment protocol. > > * May I know more about this handshake change? Is there a repository? > OR a page/document that describes what is being planned? Is it just > channel establishment change or there's more to it? After a 2nd thought, maybe we don't need a new compat property, and this should work even before handshake ready. Firstly, we'll need a way to tell mgmt that the new qemu binary supports enablement of both multifd + postcopy feature. That can be done with a "features": [ "postcopy-with-multifd-precopy" ] Flag attached to the QMP "migrate" command. Then, I think we don't need a magic for preempt channel, because new qemu binaries (after 7.2) have no issue on out-of-order connections between main / preempt channel. Preempt channel is always connected later than main. It means in the test logic of "which channel is which", it should be: - If it's a multifd channel (check multifd header match), it's a multifd channel, otherwise, - if main channel is not present yet, it must be the main channel (and we can double check the main channel magic), otherwise, - it's the preempt channel With that, I think we can drop the new magic sent here. Thanks, -- Peter Xu
On Mon, 4 Nov 2024 at 22:48, Peter Xu <peterx@redhat.com> wrote: > Firstly, we'll need a way to tell mgmt that the new qemu binary supports > enablement of both multifd + postcopy feature. That can be done with a > > "features": [ "postcopy-with-multifd-precopy" ] > > Flag attached to the QMP "migrate" command. * IIUC, currently virsh(1)/libvirtd(8) sends the migration command to the destination to inform it of the migration features to use, whether to use multifd or postcopy or none etc. Based on that the destination QEMU prepares to accept incoming VM. Not sure how/what above flag shall benefit. > Then, I think we don't need a magic for preempt channel, because new qemu > binaries (after 7.2) have no issue on out-of-order connections between main > / preempt channel. Preempt channel is always connected later than main. > > It means in the test logic of "which channel is which", it should be: > > - If it's a multifd channel (check multifd header match), it's a multifd > channel, otherwise, > > - if main channel is not present yet, it must be the main channel (and > we can double check the main channel magic), otherwise, > > - it's the preempt channel > > With that, I think we can drop the new magic sent here. * Sending magic value on the postcopy channel only makes it consistent with other channels. There's no harm in sending it. Explicitly defining/sending the magic value is better than leaving it for the code/reader to figure it out. Is there a compelling reason to drop it? IMO, we should either define/send the magic value for all channels or none. Both ways are consistent. Defining it for few and not for others does not seem right. Thank you. --- - Prasad
On Tue, Nov 05, 2024 at 04:49:23PM +0530, Prasad Pandit wrote: > On Mon, 4 Nov 2024 at 22:48, Peter Xu <peterx@redhat.com> wrote: > > Firstly, we'll need a way to tell mgmt that the new qemu binary supports > > enablement of both multifd + postcopy feature. That can be done with a > > > > "features": [ "postcopy-with-multifd-precopy" ] > > > > Flag attached to the QMP "migrate" command. > > * IIUC, currently virsh(1)/libvirtd(8) sends the migration command to > the destination to inform it of the migration features to use, whether > to use multifd or postcopy or none etc. Based on that the destination > QEMU prepares to accept incoming VM. Not sure how/what above flag > shall benefit. See: https://www.qemu.org/docs/master/devel/qapi-code-gen.html Sometimes, the behaviour of QEMU changes compatibly, but without a change in the QMP syntax (usually by allowing values or operations that previously resulted in an error). QMP clients may still need to know whether the extension is available. For this purpose, a list of features can be specified for definitions, enumeration values, and struct members. Each feature list member can either be { 'name': STRING, '*if': COND }, or STRING, which is shorthand for { 'name': STRING }. > > > Then, I think we don't need a magic for preempt channel, because new qemu > > binaries (after 7.2) have no issue on out-of-order connections between main > > / preempt channel. Preempt channel is always connected later than main. > > > > It means in the test logic of "which channel is which", it should be: > > > > - If it's a multifd channel (check multifd header match), it's a multifd > > channel, otherwise, > > > > - if main channel is not present yet, it must be the main channel (and > > we can double check the main channel magic), otherwise, > > > > - it's the preempt channel > > > > With that, I think we can drop the new magic sent here. > > * Sending magic value on the postcopy channel only makes it consistent > with other channels. There's no harm in sending it. Explicitly > defining/sending the magic value is better than leaving it for the > code/reader to figure it out. Is there a compelling reason to drop it? > IMO, we should either define/send the magic value for all channels or > none. Both ways are consistent. Defining it for few and not for others > does not seem right. It's a legacy issue as not all features are developed together, and that was planned to be fixed together with handshake. I think the handshake could introduce one header on top to pair channels. IMHO it is an overkill to add a feature now if it works even if tricky, because it's not the 1st day it was tricky. And we're going to have another header very soon.. Thanks, -- Peter Xu
On Tue, 5 Nov 2024 at 18:30, Peter Xu <peterx@redhat.com> wrote: > https://www.qemu.org/docs/master/devel/qapi-code-gen.html > > Sometimes, the behaviour of QEMU changes compatibly, but without a > change in the QMP syntax (usually by allowing values or operations > that previously resulted in an error). QMP clients may still need > to know whether the extension is available. > > For this purpose, a list of features can be specified for > definitions, enumeration values, and struct members. Each feature > list member can either be { 'name': STRING, '*if': COND }, or > STRING, which is shorthand for { 'name': STRING }. * I see, okay. > It's a legacy issue as not all features are developed together, and that > was planned to be fixed together with handshake. I think the handshake > could introduce one header on top to pair channels. > > IMHO it is an overkill to add a feature now if it works even if tricky, > because it's not the 1st day it was tricky. And we're going to have another > header very soon.. * See, current (this series) 'if' conditional in the migration_ioc_process_incoming() function is simple as: if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { peek_magic_bytes() ... } If we don't send magic value for the postcopy channel, then we avoid peeking into magic bytes when postcopy is enabled, because otherwise thread will block peeking into the magic bytes, so the 'if' conditional becomes: if (migrate_multifd() && !migrate_postcopy() && qio_channel_has_feature (...) ) { peek_magic_bytes() ... } else { When migrate_postcopy() is true It'll reach here not only for the 'Postcopy' channel, but even for the 'default' and 'multifd' channels which send the magic bytes. Then here again we'll need to identify different channels, right? } * Let's not make it so complex. Let's send the magic value for the postcopy channel and simplify it. If 'handshake' feature is going to redo it, so be it, what's the difference? OR maybe we can align it with the 'handshake' feature or as part of it or something like that. @Fabiano Rosas : may I know more about the 'handshake' feature? What it'll do and not do? Thank you. --- - Prasad
On Wed, Nov 06, 2024 at 05:49:23PM +0530, Prasad Pandit wrote: > On Tue, 5 Nov 2024 at 18:30, Peter Xu <peterx@redhat.com> wrote: > > https://www.qemu.org/docs/master/devel/qapi-code-gen.html > > > > Sometimes, the behaviour of QEMU changes compatibly, but without a > > change in the QMP syntax (usually by allowing values or operations > > that previously resulted in an error). QMP clients may still need > > to know whether the extension is available. > > > > For this purpose, a list of features can be specified for > > definitions, enumeration values, and struct members. Each feature > > list member can either be { 'name': STRING, '*if': COND }, or > > STRING, which is shorthand for { 'name': STRING }. > > * I see, okay. > > > It's a legacy issue as not all features are developed together, and that > > was planned to be fixed together with handshake. I think the handshake > > could introduce one header on top to pair channels. > > > > IMHO it is an overkill to add a feature now if it works even if tricky, > > because it's not the 1st day it was tricky. And we're going to have another > > header very soon.. > > * See, current (this series) 'if' conditional in the > migration_ioc_process_incoming() function is simple as: > > if (qio_channel_has_feature(ioc, > QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { peek_magic_bytes() ... } IIUC we can't simply change it to this. We can do this with a compat property and we can start sending a magic in the preempt channel, but then the code still needs to keep working with old binaries of QEMU, so in all cases we'll need to keep the old complexity for quite a while. IOW, I think it may break migrations from old QEMUs when postcopy preempt is enabled, because then the new QEMU (with your patch applied) will always peek the byte assuming the magic is there, but old binaries don't have those. Handshake, in my mind, will use a totally separate path, then the hope is we'll move to that with more machine types and finally obsolete / remove this path. > > If we don't send magic value for the postcopy channel, then we avoid > peeking into magic bytes when postcopy is enabled, because otherwise > thread will block peeking into the magic bytes, so the 'if' > conditional becomes: > > if (migrate_multifd() && !migrate_postcopy() && > qio_channel_has_feature (...) ) { > peek_magic_bytes() > ... > } else { > When migrate_postcopy() is true > It'll reach here not only for the 'Postcopy' channel, but even > for the 'default' and 'multifd' channels which send the magic bytes. > Then here again we'll need to identify different channels, right? > } > > * Let's not make it so complex. Let's send the magic value for the Firstly, the complexity is there on migration, requiring it work with old qemu binaries, bi-directional by default. In your case you're changing receiving side, so it's even more important, because it's common old qemu migrates to new ones. What I want to avoid is we introduce two flags in a short period doing the same thing. If we want we can merge the effort, I'll leave that to you and Fabiano to decide, so that maybe you can work out the channel establishment part of things. But note again that I still think your goal of enabling multifd + postcopy may not require that new flag yet, simply because after 7.2 qemu will connect preempt channel later than the main channel. I think logically QEMU can identify which channel is which: the preempt channel must be established in this case after both main thread and multifd threads. Meanwhile, as mentioned above, we still need to make pre-7.2 works in general on migration in most cases (when there's network instability it won't work.. so that's unavoidable).. it's indeed slightly complicated, but hopefully could still work. In all cases, we may want to test postcopy preempt from a 7.1 qemu to the new qemu to keep it working when the patch is ready (and you can already try that with current patch). > postcopy channel and simplify it. If 'handshake' feature is going to > redo it, so be it, what's the difference? OR maybe we can align it > with the 'handshake' feature or as part of it or something like that. > > @Fabiano Rosas : may I know more about the 'handshake' feature? What > it'll do and not do? > > Thank you. > --- > - Prasad > -- Peter Xu
On Wed, 6 Nov 2024 at 21:30, Peter Xu <peterx@redhat.com> wrote: > IIUC we can't simply change it to this. We can do this with a compat > property and we can start sending a magic in the preempt channel, but then > the code still needs to keep working with old binaries of QEMU, so in all > cases we'll need to keep the old complexity for quite a while. * I see...sigh. > Handshake, in my mind, will use a totally separate path, then the hope is > we'll move to that with more machine types and finally obsolete / remove > this path. * I need to check how 'separate path' works. > But note again that I still think your goal of enabling multifd + postcopy > may not require that new flag yet, simply because after 7.2 qemu will > connect preempt channel later than the main channel. I think logically > QEMU can identify which channel is which: the preempt channel must be > established in this case after both main thread and multifd threads. * You mean, instead of relying on magic bytes, we check if both 'main channel' and 'multifd channels' already exist, then the incoming connection is assumed to be for the 'postcopy preempt' channel? Thank you. --- - Prasad
On Thu, Nov 07, 2024 at 05:22:05PM +0530, Prasad Pandit wrote: > On Wed, 6 Nov 2024 at 21:30, Peter Xu <peterx@redhat.com> wrote: > > IIUC we can't simply change it to this. We can do this with a compat > > property and we can start sending a magic in the preempt channel, but then > > the code still needs to keep working with old binaries of QEMU, so in all > > cases we'll need to keep the old complexity for quite a while. > > * I see...sigh. > > > Handshake, in my mind, will use a totally separate path, then the hope is > > we'll move to that with more machine types and finally obsolete / remove > > this path. > > * I need to check how 'separate path' works. The plan is handshake will only happen on the main channel, and that includes waiting all the channels to be established. There, each channel may need to have its own header, it could be a new handshake header that whatever migration channel will start to establish, so it's named channels and dest qemu handshake logic can "understand" which channel is which, and properly assign those channels in the handshake.c logic, for example. On src, now we kick off migration by migration_should_start_incoming() returns true, only relying on "whether qemu have all the channels ready". The handshake code can do more, so it checks more, then after all handshake ready it can directly invoke migration_incoming_process() in the separate path, to which stage it also needs to guarantee channel establishment. We'll need to keep this path though for "if (!migrate_handshake())". > > > But note again that I still think your goal of enabling multifd + postcopy > > may not require that new flag yet, simply because after 7.2 qemu will > > connect preempt channel later than the main channel. I think logically > > QEMU can identify which channel is which: the preempt channel must be > > established in this case after both main thread and multifd threads. > > * You mean, instead of relying on magic bytes, we check if both 'main > channel' and 'multifd channels' already exist, then the incoming > connection is assumed to be for the 'postcopy preempt' channel? Exactly. So we keep the fact that we should only peek() if multifd is enabled at least. Then in your case even postcopy is also enabled, it won't connect the preempt channel until very late (entry of postcopy_start()), and it'll only connect if it receives a PONG first (migration_wait_main_channel()). That means dest has all multifd+main channels ready otherwise no PONG will generate. This makes sure we'll never try to peek() on preempt channel (which may hang) as long as it's always the latest. I think 7.1 will keep working even if it behaves differently (it connects preempt channel earlier, see preempt_pre_7_2), because the 1st requirement in the new code (and also, in the old code) you will also only peek() if multifd enabled first, while multifd cannot be enabled before with postcopy/preempt or it was already a mess, so we can imagine old users only enable either multifd or postcopy. Thanks, -- Peter Xu
Prasad Pandit <ppandit@redhat.com> writes: > On Tue, 5 Nov 2024 at 18:30, Peter Xu <peterx@redhat.com> wrote: >> https://www.qemu.org/docs/master/devel/qapi-code-gen.html >> >> Sometimes, the behaviour of QEMU changes compatibly, but without a >> change in the QMP syntax (usually by allowing values or operations >> that previously resulted in an error). QMP clients may still need >> to know whether the extension is available. >> >> For this purpose, a list of features can be specified for >> definitions, enumeration values, and struct members. Each feature >> list member can either be { 'name': STRING, '*if': COND }, or >> STRING, which is shorthand for { 'name': STRING }. > > * I see, okay. > >> It's a legacy issue as not all features are developed together, and that >> was planned to be fixed together with handshake. I think the handshake >> could introduce one header on top to pair channels. >> >> IMHO it is an overkill to add a feature now if it works even if tricky, >> because it's not the 1st day it was tricky. And we're going to have another >> header very soon.. > > * See, current (this series) 'if' conditional in the > migration_ioc_process_incoming() function is simple as: > > if (qio_channel_has_feature(ioc, > QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { peek_magic_bytes() ... } > > If we don't send magic value for the postcopy channel, then we avoid > peeking into magic bytes when postcopy is enabled, because otherwise > thread will block peeking into the magic bytes, so the 'if' > conditional becomes: > > if (migrate_multifd() && !migrate_postcopy() && > qio_channel_has_feature (...) ) { > peek_magic_bytes() > ... > } else { > When migrate_postcopy() is true > It'll reach here not only for the 'Postcopy' channel, but even > for the 'default' and 'multifd' channels which send the magic bytes. > Then here again we'll need to identify different channels, right? > } > > * Let's not make it so complex. Let's send the magic value for the > postcopy channel and simplify it. If 'handshake' feature is going to > redo it, so be it, what's the difference? OR maybe we can align it > with the 'handshake' feature or as part of it or something like that. > > @Fabiano Rosas : may I know more about the 'handshake' feature? What > it'll do and not do? What we're thinking is having an initial exchange of information between src & dst as soon as migration starts and that would sync the capabilities and parameters between both sides. Which would then be followed by a channel establishment phase that would open each necessary channel (according to caps) in order, removing the current ambiguity. > > Thank you. > --- > - Prasad
On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote: > What we're thinking is having an initial exchange of information between > src & dst as soon as migration starts and that would sync the > capabilities and parameters between both sides. Which would then be > followed by a channel establishment phase that would open each necessary > channel (according to caps) in order, removing the current ambiguity. > * Isn't that how it works? IIUC, libvirtd(8) sends migration command options to the destination and based on that the destination prepares for the multifd and/or postcopy migration. In case of 'Postcopy' the source sends 'postcopy advise' to the destination to indicate that postcopy might follow at the end of precopy. Also, in the discussion above Peter mentioned that libvirtd(8) may exchange list of features between source and destination to facilitate QMP clients. * What is the handshake doing differently? (just trying to understand) Thank you. --- - Prasad
On Thu, Nov 07, 2024 at 05:35:06PM +0530, Prasad Pandit wrote: > On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote: > > What we're thinking is having an initial exchange of information between > > src & dst as soon as migration starts and that would sync the > > capabilities and parameters between both sides. Which would then be > > followed by a channel establishment phase that would open each necessary > > channel (according to caps) in order, removing the current ambiguity. > > > > * Isn't that how it works? IIUC, libvirtd(8) sends migration command > options to the destination and based on that the destination prepares > for the multifd and/or postcopy migration. In case of 'Postcopy' the > source sends 'postcopy advise' to the destination to indicate that > postcopy might follow at the end of precopy. Also, in the discussion > above Peter mentioned that libvirtd(8) may exchange list of features > between source and destination to facilitate QMP clients. > > * What is the handshake doing differently? (just trying to understand) Libvirt does what it does because it has had no other choice, not because it was good or desirable. This kind of handshake really does not belong in libvirt. A number of exposed migration protocol feature knobs should be considered private to QEMU only. It has the very negative consequence that every time QEMU wants to provide a new feature in migration, it needs to be plumbed up through libvirt, and often applications above, and those 3rd party projects need to be told when & where to use the new features. The 3rd party developers have their own project dev priorities so may not get around to enable the new migration features for years, if ever, undermining the work of QEMU's migration maintainers. As examples... If we had QEMU self-negotiation of features 10 years ago, everywhere would already be using multifd out of the box. QEMU would have been able to self-negotiate use of the new "multifd" protocol, and QEMU would be well on its way to being able to delete the old single- threaded migration code. Similarly post-copy would have been way easier for apps, QEMU would auto-negotiate a channel for the post-copy async page fetching. All migrations would be running with the post-copy feature available. All that libvirt & apps would have needed was a API to initiate the switch to post-copy mode. Or the hacks QEMU has put in place where we peek at incoming data on some channels to identify the channel type would not exist. TL;DR: once QEMU can self-negotiate features for migration itself, the implementation burden for libvirt & applications is greatly reduced. QEMU migration maintainers will control their own destiny, able to deliver improvements to users much more quickly, be able to delete obsolete features more quickly, and be able to make migration *automatically* enable new features & pick the optimal defaults on their own expert knowledge, not waitnig for 3rd parties to pay attention years later. Some things will still need work & decisions in libvirt & apps, but this burden should be reduced compared over the long term. Ultimately everyone will win. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Nov 07, 2024 at 12:33:17PM +0000, Daniel P. Berrangé wrote: > On Thu, Nov 07, 2024 at 05:35:06PM +0530, Prasad Pandit wrote: > > On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote: > > > What we're thinking is having an initial exchange of information between > > > src & dst as soon as migration starts and that would sync the > > > capabilities and parameters between both sides. Which would then be > > > followed by a channel establishment phase that would open each necessary > > > channel (according to caps) in order, removing the current ambiguity. > > > > > > > * Isn't that how it works? IIUC, libvirtd(8) sends migration command > > options to the destination and based on that the destination prepares > > for the multifd and/or postcopy migration. In case of 'Postcopy' the > > source sends 'postcopy advise' to the destination to indicate that > > postcopy might follow at the end of precopy. Also, in the discussion > > above Peter mentioned that libvirtd(8) may exchange list of features > > between source and destination to facilitate QMP clients. > > > > * What is the handshake doing differently? (just trying to understand) > > Libvirt does what it does because it has had no other choice, > not because it was good or desirable. > > This kind of handshake really does not belong in libvirt. A number > of exposed migration protocol feature knobs should be considered > private to QEMU only. > > It has the very negative consequence that every time QEMU wants to > provide a new feature in migration, it needs to be plumbed up through > libvirt, and often applications above, and those 3rd party projects > need to be told when & where to use the new features. The 3rd party > developers have their own project dev priorities so may not get > around to enable the new migration features for years, if ever, > undermining the work of QEMU's migration maintainers. > > As examples... > > If we had QEMU self-negotiation of features 10 years ago, everywhere > would already be using multifd out of the box. QEMU would have been > able to self-negotiate use of the new "multifd" protocol, and QEMU > would be well on its way to being able to delete the old single- > threaded migration code. > > Similarly post-copy would have been way easier for apps, QEMU would > auto-negotiate a channel for the post-copy async page fetching. All > migrations would be running with the post-copy feature available. > All that libvirt & apps would have needed was a API to initiate the > switch to post-copy mode. > > Or the hacks QEMU has put in place where we peek at incoming data > on some channels to identify the channel type would not exist. > > > TL;DR: once QEMU can self-negotiate features for migration itself, > the implementation burden for libvirt & applications is greatly > reduced. QEMU migration maintainers will control their own destiny, > able to deliver improvements to users much more quickly, be able > to delete obsolete features more quickly, and be able to make > migration *automatically* enable new features & pick the optimal > defaults on their own expert knowledge, not waitnig for 3rd parties > to pay attention years later. > > Some things will still need work & decisions in libvirt & apps, > but this burden should be reduced compared over the long term. > Ultimately everyone will win. I'll comment on a few examples above, which I think some of them, even if handshake is ready, may still need mgmt layers to involve.. Multifd and postcopy are the two major features, and they, IMHO, all better need user involvements.. Multifd needs it because it relies on the channel being able to provide >1 channels. It means "| nc XXX > file" can stop working if we turn it on by default silently. We also see generic use case in containers now that when there're dedicated cores for vcpus and "no dedicate" / "one" core setup for qemu hypervisor threads, it means anything like multifd or even block layer iothreads can be pure overheads comparing to one thread / one channel does the job.. in those cases they're better disabled. Postcopy, when enabled manually like right now, has one benefit I can think of: when the user will 100% use postcopy, the failure can happen earlier if dest host doesn't even support it! So it'll generate an error upfront saying "postcopy not supported" before migration starts, comparing to fail later at migrate_start_postcopy, then fail that command, but maybe if so the user may not even start the migration at all, knowing workload too busy to converge. It's pretty common that postcopy is not supported on dest due to the fact that people are over cautious on what userfault could do (which I kind of agree.. it's a tricky but powerful feature), so unprivileged_userfaultfd=0 knob alone can disable it for many hosts, afaiu.. simply because kvm will be the default accel, and qemu needs kernel fault trapping capability to make postcopy work.. which requires unprivileged userfaults that qemu process may not have. Maybe the postcopy-preempt mode is the one closest to what we can enable by default, so if it's not a capability we could auto-enalbe it with/without handshake being there. However there's still the tricky part where it starts to require >1 channels, so if it's the default it could break anyone who has a proxy after the socket, for example, iff the proxy only supports one channel.. similar to multifd, but not completely the same. Mapped-ram, definitely needs user involvements, because it will change the image layout. So.. just to say, maybe we'll still need some mgmt help to explicitly enable many of the features after the handshake work, as the mgmt knows better on what is the context of the VM, rather than risking qemu default configs to fail on some existing but working setups. Though at least libvirt will only need to set caps/params only once for each migration. Thanks, -- Peter Xu
On Thu, Nov 07, 2024 at 11:17:30AM -0500, Peter Xu wrote: > On Thu, Nov 07, 2024 at 12:33:17PM +0000, Daniel P. Berrangé wrote: > I'll comment on a few examples above, which I think some of them, even if > handshake is ready, may still need mgmt layers to involve.. > > Multifd and postcopy are the two major features, and they, IMHO, all better > need user involvements.. > > Multifd needs it because it relies on the channel being able to provide >1 > channels. It means "| nc XXX > file" can stop working if we turn it on by > default silently. NB, my point was referring to a hypothetical alternative history, where we had the handshake at the QEMU level from day 1. That would neccessarily imply a bi-directional channel, so the 'nc' use case would already have been out of scope. That said, QEMU could identify whether the channel it was told to use was bi-directional or not, and thus not try to do multifd over a non-socket transport. So the general point still holds - if QEMU had this protocol negotiation phase built-in, there would be more flexiblity in introducing new features without layers above needing changes, for every single one, just a subset. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Nov 07, 2024 at 04:57:46PM +0000, Daniel P. Berrangé wrote: > On Thu, Nov 07, 2024 at 11:17:30AM -0500, Peter Xu wrote: > > On Thu, Nov 07, 2024 at 12:33:17PM +0000, Daniel P. Berrangé wrote: > > I'll comment on a few examples above, which I think some of them, even if > > handshake is ready, may still need mgmt layers to involve.. > > > > Multifd and postcopy are the two major features, and they, IMHO, all better > > need user involvements.. > > > > Multifd needs it because it relies on the channel being able to provide >1 > > channels. It means "| nc XXX > file" can stop working if we turn it on by > > default silently. > > NB, my point was referring to a hypothetical alternative history, > where we had the handshake at the QEMU level from day 1. That > would neccessarily imply a bi-directional channel, so the 'nc' > use case would already have been out of scope. That said, QEMU > could identify whether the channel it was told to use was > bi-directional or not, and thus not try to do multifd over > a non-socket transport. Ah, that's true. > > So the general point still holds - if QEMU had this protocol > negotiation phase built-in, there would be more flexiblity in > introducing new features without layers above needing changes, > for every single one, just a subset. Yes. Just to mention, we can already do that now without handshake, as long as the feature has zero side effect, and as long as we don't expose it as a migration capability (which by default all off). In that case, we can make the property to "on", and add "off" in machine compat properties. IOW, machine compat property can play part of the role as handshake, based on the machine type must be the same when initiating QEMU on both hosts. Thanks, -- Peter Xu
Hello Peter, Dan, Fabiano, Thank you so much for the detailed comments, I appreciate it. On Thu, 7 Nov 2024 at 17:41, Fabiano Rosas <farosas@suse.de> wrote: > The handshake will be a QEMU-only feature. Libvirt will then only start > the migration on src and QEMU will do the capabilities handling. > On Thu, 7 Nov 2024 at 18:03, Daniel P. Berrangé <berrange@redhat.com> wrote: > Libvirt does what it does because it has had no other choice, > not because it was good or desirable. > > This kind of handshake really does not belong in libvirt. A number > of exposed migration protocol feature knobs should be considered > private to QEMU only. * Right, okay. * So then IIUC, libvirtd(8) would invoke migration by sending (without first checking with the destination libvirtd(8)) the migration command to the source QEMU and in qmp_migrate() before setting up the required connections, QEMU will add the feature negotiation (or handshake) step, right? > It has the very negative consequence that every time QEMU wants to > provide a new feature in migration, it needs to be plumbed up through > libvirt, and often applications above, and those 3rd party projects > need to be told when & where to use the new features. The 3rd party > developers have their own project dev priorities so may not get > around to enable the new migration features for years, if ever, > undermining the work of QEMU's migration maintainers. * True. I've seen the mismatch/disconnect between QEMU features and how libvirtd(8)/virsh(1) offers them to the end users. ex. What QEMU calls Multifd, virsh(1) calls --parallel-connections. Features like 'postcopy-preempt-channel' are implemented in QEMU, but no way for an end user to see/enable/disable it from virsh(1) side. * TBH, Multifd is such a misnomer, it could have been a plain simple --threads <N> option. ex: virsh migrate --threads <N>: precopy migration with <N> threads, default <N> = 1. Irrespective of the number of threads, the underlying migration mechanism/protocol remains the same. Threads only help to accelerate the rate of data transfer through multiple connections. We don't have to differentiate channels by sending magic values then. * Since we are thinking about correcting past wrongs, does having a unified migration protocol make sense? OR that is too ambitious a thing to think about? (just checking) * Meanwhile, @Fabiano Rosas If you have not started with the handshake or feature negotiation in QEMU, I'd try it on my side and we can discuss how the handshake should work. If you've already started with it, I'd be happy to help in some way as possible. * Are we thinking about dynamically adjusting migration features based on their availability on both sides? Ex. say source says open 10 parallel connections, but destination can do only 5, then source reports an error and terminates migration or continues with 5 threads/connections OR say user does not mention parallel connections, but still we automatically start multiple threads because both ends support it? Just checking about dynamic adjustments, because earlier while discussing with Peter, he mentioned that we can not enable/disable user supplied options. Thank you. --- - Prasad
Prasad Pandit <ppandit@redhat.com> writes: > Hello Peter, Dan, Fabiano, > > Thank you so much for the detailed comments, I appreciate it. > > On Thu, 7 Nov 2024 at 17:41, Fabiano Rosas <farosas@suse.de> wrote: >> The handshake will be a QEMU-only feature. Libvirt will then only start >> the migration on src and QEMU will do the capabilities handling. >> > On Thu, 7 Nov 2024 at 18:03, Daniel P. Berrangé <berrange@redhat.com> wrote: >> Libvirt does what it does because it has had no other choice, >> not because it was good or desirable. >> >> This kind of handshake really does not belong in libvirt. A number >> of exposed migration protocol feature knobs should be considered >> private to QEMU only. > > * Right, okay. > > * So then IIUC, libvirtd(8) would invoke migration by sending (without > first checking with the destination libvirtd(8)) the migration command > to the source QEMU and in qmp_migrate() before setting up the required > connections, QEMU will add the feature negotiation (or handshake) > step, right? Yes, but there are still some points to figure out, as Peter mentioned, such as how to handle capabilities for which there is a high chance that libvirt does actually want to control, e.g. multifd. One approach is to just continue to allow migrate-set-caps before qmp-migrate and take those into account during handshake as well. > >> It has the very negative consequence that every time QEMU wants to >> provide a new feature in migration, it needs to be plumbed up through >> libvirt, and often applications above, and those 3rd party projects >> need to be told when & where to use the new features. The 3rd party >> developers have their own project dev priorities so may not get >> around to enable the new migration features for years, if ever, >> undermining the work of QEMU's migration maintainers. > > * True. I've seen the mismatch/disconnect between QEMU features and > how libvirtd(8)/virsh(1) offers them to the end users. ex. What QEMU > calls Multifd, virsh(1) calls --parallel-connections. Features like > 'postcopy-preempt-channel' are implemented in QEMU, but no way for an > end user to see/enable/disable it from virsh(1) side. > > * TBH, Multifd is such a misnomer, it could have been a plain simple > --threads <N> option. > ex: virsh migrate --threads <N>: precopy migration with <N> > threads, default <N> = 1. > > Irrespective of the number of threads, the underlying migration > mechanism/protocol remains the same. Threads only help to accelerate > the rate of data transfer through multiple connections. We don't have > to differentiate channels by sending magic values then. > > * Since we are thinking about correcting past wrongs, does having a > unified migration protocol make sense? OR that is too ambitious a > thing to think about? (just checking) > > * Meanwhile, @Fabiano Rosas If you have not started with the handshake > or feature negotiation in QEMU, I'd try it on my side and we can > discuss how the handshake should work. If you've already started with > it, I'd be happy to help in some way as possible. > I'm putting together a prototype that we can iterate on. I'll let you know as soon as I have something I can share. > * Are we thinking about dynamically adjusting migration features based > on their availability on both sides? Ex. say source says open 10 > parallel connections, but destination can do only 5, then source > reports an error and terminates migration or continues with 5 > threads/connections OR say user does not mention parallel connections, > but still we automatically start multiple threads because both ends > support it? Just checking about dynamic adjustments, because earlier > while discussing with Peter, he mentioned that we can not > enable/disable user supplied options. > > Thank you. > --- > - Prasad
Prasad Pandit <ppandit@redhat.com> writes: > On Wed, 6 Nov 2024 at 18:41, Fabiano Rosas <farosas@suse.de> wrote: >> What we're thinking is having an initial exchange of information between >> src & dst as soon as migration starts and that would sync the >> capabilities and parameters between both sides. Which would then be >> followed by a channel establishment phase that would open each necessary >> channel (according to caps) in order, removing the current ambiguity. >> > > * Isn't that how it works? IIUC, libvirtd(8) sends migration command > options to the destination and based on that the destination prepares > for the multifd and/or postcopy migration. In case of 'Postcopy' the > source sends 'postcopy advise' to the destination to indicate that > postcopy might follow at the end of precopy. Also, in the discussion > above Peter mentioned that libvirtd(8) may exchange list of features > between source and destination to facilitate QMP clients. > > * What is the handshake doing differently? (just trying to understand) The handshake will be a QEMU-only feature. Libvirt will then only start the migration on src and QEMU will do the capabilities handling. > > Thank you. > --- > - Prasad
© 2016 - 2024 Red Hat, Inc.