[PATCH] qemu_alias: Fix backcompat console alias generation

Michal Privoznik posted 1 patch 1 year, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/5f412e05f4dab43c913214272e91b7261b665822.1678789249.git.mprivozn@redhat.com
src/qemu/qemu_alias.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
[PATCH] qemu_alias: Fix backcompat console alias generation
Posted by Michal Privoznik 1 year, 2 months ago
We have this crazy backwards compatibility when it comes to
serial and console devices. Basically, in same cases the very
first <console/> is just an alias to the very first <serial/>
device. This is to be seen at various places:

1) virDomainDefFormatInternalSetRootName() - when generating
   domain XML, the <console/> configuration is basically ignored
   and corresponding <serial/> config is formatted,

2) virDomainDefAddConsoleCompat() - which adds a copy of
   <serial/> or <console/> into virDomainDef in post parse.

And when talking to QEMU we need a special handling too, because
while <serial/> is generated on the cmd line, the <console/> is
not. And in a lot of place we get it right. Except for generating
device aliases. On domain startup the 'expected' happens and
devices get "serial0" and "console0" aliases, correspondingly.
This ends up in the status XML too. But due to aforementioned
trick when formatting domain XML, "serial0" ends up in both
'virsh dumpxml' and the status XML. But internally, both devices
have different alias. Therefore, detaching the device using
<console/> fails as qemuDomainDetachDeviceChr() tries to detach
"console0".

After the daemon is restarted and status XML is parsed, then
everything works suddenly. This is because in the status XML both
devices have the same alias.

Let's generate correct alias from the beginning.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2156300
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/qemu/qemu_alias.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c
index a9809797d5..05d25da93d 100644
--- a/src/qemu/qemu_alias.c
+++ b/src/qemu/qemu_alias.c
@@ -90,6 +90,18 @@ qemuAssignDeviceChrAlias(virDomainDef *def,
     if (chr->info.alias)
         return 0;
 
+    /* Some crazy backcompat for consoles. Look into
+     * virDomainDefAddConsoleCompat() for more explanation. */
+    if (chr->deviceType == VIR_DOMAIN_CHR_DEVICE_TYPE_CONSOLE &&
+        chr->targetType == VIR_DOMAIN_CHR_CONSOLE_TARGET_TYPE_SERIAL &&
+        def->os.type == VIR_DOMAIN_OSTYPE_HVM &&
+        def->consoles[0] == chr &&
+        def->nserials &&
+        def->serials[0]->info.alias) {
+        chr->info.alias = g_strdup(def->serials[0]->info.alias);
+        return 0;
+    }
+
     switch ((virDomainChrDeviceType)chr->deviceType) {
     case VIR_DOMAIN_CHR_DEVICE_TYPE_PARALLEL:
         prefix = "parallel";
-- 
2.39.2
Re: [PATCH] qemu_alias: Fix backcompat console alias generation
Posted by Andrea Bolognani 1 year, 2 months ago
On Tue, Mar 14, 2023 at 11:20:49AM +0100, Michal Privoznik wrote:
> We have this crazy backwards compatibility when it comes to
> serial and console devices.

This truly is the gift that keeps on giving :/

> Basically, in same cases the very
> first <console/> is just an alias to the very first <serial/>
> device. This is to be seen at various places:
>
> 1) virDomainDefFormatInternalSetRootName() - when generating
>    domain XML, the <console/> configuration is basically ignored
>    and corresponding <serial/> config is formatted,

Wow, I wasn't aware of that part of the insanity. That got me
confused for a bit: dumpxml on a running domain was showing the
expected alias for the <console>. Now I understand why! It was
just displaying the information for the <serial> instead.

> 2) virDomainDefAddConsoleCompat() - which adds a copy of
>    <serial/> or <console/> into virDomainDef in post parse.

Since we add the missing elements in this function, shouldn't we be
able to remove the hacky code from the format part?

I've done a quick test and it didn't work, obviously :) but maybe
we're just failing to update some parts of the definition somewhere
else, and fixing those instances would allow to restore some sanity
to the formatting routine?

> And when talking to QEMU we need a special handling too, because
> while <serial/> is generated on the cmd line, the <console/> is
> not. And in a lot of place we get it right. Except for generating
> device aliases. On domain startup the 'expected' happens and
> devices get "serial0" and "console0" aliases, correspondingly.
> This ends up in the status XML too. But due to aforementioned
> trick when formatting domain XML, "serial0" ends up in both
> 'virsh dumpxml' and the status XML. But internally, both devices
> have different alias. Therefore, detaching the device using
> <console/> fails as qemuDomainDetachDeviceChr() tries to detach
> "console0".
>
> After the daemon is restarted and status XML is parsed, then
> everything works suddenly. This is because in the status XML both
> devices have the same alias.

So is the fact that two devices end up with the same alias something
that we consider okay? When it comes to user aliases, such a
configuration is (understandably) rejected.

Clearly nothing explodes too badly in this situation, as evidenced by
the fact that things actually start working more reasonably once the
two devices get matching aliases, but I wonder if it wouldn't be more
sensible to assign different aliases for the <console> and the
<serial>, then handle compatibility further down, closer to where we
generate command line arguments and monitor commands?

> Let's generate correct alias from the beginning.
>
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2156300

After your patches, the "unknown cause" error is no longer reported,
and instead we get

  unable to execute QEMU command 'device_del': Bus 'isa.0' does not
support hotplugging

which is clearly a better error message.

Doesn't really help with the fact that it's fundamentally impossible
to hot-unplug platform devices, unfortunately :)

> +    /* Some crazy backcompat for consoles. Look into
> +     * virDomainDefAddConsoleCompat() for more explanation. */
> +    if (chr->deviceType == VIR_DOMAIN_CHR_DEVICE_TYPE_CONSOLE &&
> +        chr->targetType == VIR_DOMAIN_CHR_CONSOLE_TARGET_TYPE_SERIAL &&
> +        def->os.type == VIR_DOMAIN_OSTYPE_HVM &&
> +        def->consoles[0] == chr &&
> +        def->nserials &&
> +        def->serials[0]->info.alias) {
> +        chr->info.alias = g_strdup(def->serials[0]->info.alias);
> +        return 0;
> +    }

This code is... Fine. It does the trick.

The only reason why I'm not ACKing it right away is that I'm
concerned we're piling hacks on top of hacks, and in particular that
allowing multiple devices to have the same alias will cause grief
further down the line.

If you have already considered this and convinced yourself that we're
okay, then say so and I'll gladly ACK the patch.

-- 
Andrea Bolognani / Red Hat / Virtualization
Re: [PATCH] qemu_alias: Fix backcompat console alias generation
Posted by Michal Prívozník 1 year, 2 months ago
On 3/14/23 20:33, Andrea Bolognani wrote:
> On Tue, Mar 14, 2023 at 11:20:49AM +0100, Michal Privoznik wrote:
>> We have this crazy backwards compatibility when it comes to
>> serial and console devices.
> 
> This truly is the gift that keeps on giving :/
> 
>> Basically, in same cases the very
>> first <console/> is just an alias to the very first <serial/>
>> device. This is to be seen at various places:
>>
>> 1) virDomainDefFormatInternalSetRootName() - when generating
>>    domain XML, the <console/> configuration is basically ignored
>>    and corresponding <serial/> config is formatted,
> 
> Wow, I wasn't aware of that part of the insanity. That got me
> confused for a bit: dumpxml on a running domain was showing the
> expected alias for the <console>. Now I understand why! It was
> just displaying the information for the <serial> instead.

Indeed. Even though the internal struct holds different data. Hence the
mismatch.

> 
>> 2) virDomainDefAddConsoleCompat() - which adds a copy of
>>    <serial/> or <console/> into virDomainDef in post parse.
> 
> Since we add the missing elements in this function, shouldn't we be
> able to remove the hacky code from the format part?

That's what I wanted to do. But A LOT of tests started failing. But I
blame my code for that. Let me see if I can do better today.

But at any rate, we would need to keep some parts of the code that
handle the aliasing - e.g. in qemuDomainChrRemove(), cmd line
generation, etc. In the end, the deciding factor for me was - in
majority of cases [1] we will just waste memory, as the def->console[0]
would be populated, but ignored. And given that my initial code didn't
work I decided to hack around the problem.

BTW: even then we still would need to keep this patch (or its variant),
because the problem would still be there. I mean, the backcompat console
handling is done in PostParse. Alias assignment is done after PostParse
(when starting the domain). So regardless of what we do in PostParse -
the aliased console would still get different alias then the serial
device. Simply because that's how qemuAssignDeviceChrAlias() is written.

But okay, we could work around that - we have qemuProcessPrepareDomain()
where we could fix the aliases. But what about other drivers? So far,
the CH driver is the only one where this crazy backcompat is not
happening (as of v9.1.0-rc1~63). But whatever we do, we would need to
just push the code from src/conf/ into the drivers.

1: majority, because by default we add serial console (as
virDomainDefAddConsoleCompat() demonstrates).

> 
> I've done a quick test and it didn't work, obviously :) but maybe
> we're just failing to update some parts of the definition somewhere
> else, and fixing those instances would allow to restore some sanity
> to the formatting routine?
> 
>> And when talking to QEMU we need a special handling too, because
>> while <serial/> is generated on the cmd line, the <console/> is
>> not. And in a lot of place we get it right. Except for generating
>> device aliases. On domain startup the 'expected' happens and
>> devices get "serial0" and "console0" aliases, correspondingly.
>> This ends up in the status XML too. But due to aforementioned
>> trick when formatting domain XML, "serial0" ends up in both
>> 'virsh dumpxml' and the status XML. But internally, both devices
>> have different alias. Therefore, detaching the device using
>> <console/> fails as qemuDomainDetachDeviceChr() tries to detach
>> "console0".
>>
>> After the daemon is restarted and status XML is parsed, then
>> everything works suddenly. This is because in the status XML both
>> devices have the same alias.
> 
> So is the fact that two devices end up with the same alias something
> that we consider okay? When it comes to user aliases, such a
> configuration is (understandably) rejected.
> 
> Clearly nothing explodes too badly in this situation, as evidenced by
> the fact that things actually start working more reasonably once the
> two devices get matching aliases, but I wonder if it wouldn't be more
> sensible to assign different aliases for the <console> and the
> <serial>, then handle compatibility further down, closer to where we
> generate command line arguments and monitor commands?
> 
>> Let's generate correct alias from the beginning.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2156300
> 
> After your patches, the "unknown cause" error is no longer reported,
> and instead we get
> 
>   unable to execute QEMU command 'device_del': Bus 'isa.0' does not
> support hotplugging
> 
> which is clearly a better error message.

Indeed. This still papers over the original problem (the "unknown
cause"). But that's because in qemuDomainDeleteDevice() the rc = -2 path
is used. Obviously. I mean, the comment in that path says it all:

  The device does not exist in qemu, but it still exists in libvirt.

which is obviously true, because we've asked QEMU to detach console0
which does not exist in QEMU.

> 
> Doesn't really help with the fact that it's fundamentally impossible
> to hot-unplug platform devices, unfortunately :)
> 
>> +    /* Some crazy backcompat for consoles. Look into
>> +     * virDomainDefAddConsoleCompat() for more explanation. */
>> +    if (chr->deviceType == VIR_DOMAIN_CHR_DEVICE_TYPE_CONSOLE &&
>> +        chr->targetType == VIR_DOMAIN_CHR_CONSOLE_TARGET_TYPE_SERIAL &&
>> +        def->os.type == VIR_DOMAIN_OSTYPE_HVM &&
>> +        def->consoles[0] == chr &&
>> +        def->nserials &&
>> +        def->serials[0]->info.alias) {
>> +        chr->info.alias = g_strdup(def->serials[0]->info.alias);
>> +        return 0;
>> +    }
> 
> This code is... Fine. It does the trick.
> 
> The only reason why I'm not ACKing it right away is that I'm
> concerned we're piling hacks on top of hacks, and in particular that
> allowing multiple devices to have the same alias will cause grief
> further down the line.

Yeah, I was worried about non-unique alias too. Until I realized that
after the daemon is restarted, the status XML is parsed and the aliases
ARE the same. IOW, we're exactly at the same position as after this patch.

> 
> If you have already considered this and convinced yourself that we're
> okay, then say so and I'll gladly ACK the patch.
> 

Yeah, I'm not proud of this patch either. It's not something I'd be
showing to others at parties. But at the same time, I think this is the
only sensible solution. MAYBE, there's another one - writing similar
hack into qemuDomainDetachDeviceChr(). But that fixes just the
device-detach case. I don't know if there's another place where this
aliasing is causing problems, so maybe we could do just that and not use
this big hammer?

Michal
Re: [PATCH] qemu_alias: Fix backcompat console alias generation
Posted by Andrea Bolognani 1 year, 2 months ago
On Wed, Mar 15, 2023 at 08:51:33AM +0100, Michal Prívozník wrote:
> On 3/14/23 20:33, Andrea Bolognani wrote:
> > On Tue, Mar 14, 2023 at 11:20:49AM +0100, Michal Privoznik wrote:
> >> 2) virDomainDefAddConsoleCompat() - which adds a copy of
> >>    <serial/> or <console/> into virDomainDef in post parse.
> >
> > Since we add the missing elements in this function, shouldn't we be
> > able to remove the hacky code from the format part?
>
> That's what I wanted to do. But A LOT of tests started failing. But I
> blame my code for that. Let me see if I can do better today.

Any progress at all towards making that part of libvirt more
reasonable would be much appreciated. If you feel like spending some
time looking into this, I'll be happy to review the results :)

> BTW: even then we still would need to keep this patch (or its variant),
> because the problem would still be there. I mean, the backcompat console
> handling is done in PostParse. Alias assignment is done after PostParse
> (when starting the domain). So regardless of what we do in PostParse -
> the aliased console would still get different alias then the serial
> device. Simply because that's how qemuAssignDeviceChrAlias() is written.
>
> But okay, we could work around that - we have qemuProcessPrepareDomain()
> where we could fix the aliases. But what about other drivers? So far,
> the CH driver is the only one where this crazy backcompat is not
> happening (as of v9.1.0-rc1~63). But whatever we do, we would need to
> just push the code from src/conf/ into the drivers.

Yeah, I hadn't considered the fact that moving the handling down to
the driver effectively means having to make multiple copies of it.

Perhaps there's a way we can organize the code so that most of the
logic still lives in common code, but it does make your solution look
more appealing.

> > The only reason why I'm not ACKing it right away is that I'm
> > concerned we're piling hacks on top of hacks, and in particular that
> > allowing multiple devices to have the same alias will cause grief
> > further down the line.
>
> Yeah, I was worried about non-unique alias too. Until I realized that
> after the daemon is restarted, the status XML is parsed and the aliases
> ARE the same. IOW, we're exactly at the same position as after this patch.

Right, the fact that we need to successfully parse back a status XML
(or an incoming migration XML) where the alias is the same means that
we're never going to be able to add a sanity check for the absence of
duplicated (non user-provided) aliases - unless of course we add even
more hacks :)

> > If you have already considered this and convinced yourself that we're
> > okay, then say so and I'll gladly ACK the patch.
>
> Yeah, I'm not proud of this patch either. It's not something I'd be
> showing to others at parties. But at the same time, I think this is the
> only sensible solution. MAYBE, there's another one - writing similar
> hack into qemuDomainDetachDeviceChr(). But that fixes just the
> device-detach case. I don't know if there's another place where this
> aliasing is causing problems, so maybe we could do just that and not use
> this big hammer?

Another way to look at it is that the two devices having the same
alias sort of make sense because they *do* represent the same device,
and not just at the QEMU driver level.

All in all, handling these devices is always going to suck. After
your changes at least the internal consistency is improved, which I
guess is the best we can realistically hope for.

Reviewed-by: Andrea Bolognani <abologna@redhat.com>

-- 
Andrea Bolognani / Red Hat / Virtualization