On Mon, Oct 10, 2022 at 04:16:33PM -0700, David Woodhouse wrote:
> On Mon, 2022-10-10 at 15:08 -0400, Peter Xu wrote:
> > On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote:
> > > On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote:
> > > > From: Peter Xu <
> > > > peterx@redhat.com
> > > >
> > > >
> > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC
> > > > IDs. However here since we already have EIM=ON it means the hypervisor
> > > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have
> > > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
> > > > even if vcpus<=255. In short, commit 77250171bdc breaks any simple cmdline
> > > > that wants to boot a VM with >=9 but <=255 vcpus with:
> > >
> > > I find that paragraph really hard to parse. What does it even mean that
> > > "guest should assume the APIC IDs are 32bits"?
> >
> > Quotting EIM definition:
> >
> > 0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC
> > Mode).
> >
> > 1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC
> > mode). Hardware implementation reporting Interrupt Remapping support
> > (IR) field as Clear also report this field as Clear.
> >
> > I hope the statement was matching the spec. Please let me know if you have
> > better way to reword it.
>
> It needs to mention logical mode addressing. Because that, I presume,
> is why it broke only when you had more than 8 vCPUs. Because that's
> when the *logical* destination ID grew past 0xFF.
Agree.
>
> > > In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the
> > > tables. Which is perfectly fine if there are only 254 CPUs anyway, and
> > > we never need to use a higher value.
> > >
> > > I *think* the actual problem here is when logical addressing is used,
> > > which puts the APIC cluster ID into higher bits? But it's kind of weird
> > > that the message doesn't mention that at all?
> >
> > The commit message actually doesn't even need to contain a lot of
> > information in this case, IMO.
>
> Well, it would be kind of useful if it said what the actual problem
> was, no?
Yes it'll be nice to have.
>
> > Literally it can be seen as a revert of a commit which breaks guest with
> > > 8vcpu from boot. I kept the other lines because that still make sense, or
> >
> > it can be a full revert with "something broke with commit xxx, revert it to
> > fix" and anything else could be reworked. AFAICT that's how it normally
> > works with QEMU or Linux.
> >
> > I am not 100% familiar with the original purpose of the patch, would
> > eim=off work for you even after patch applied? Anything severely wrong
> > with this patch?
>
> I think the patch itself is fine; I'd just like the commit message to
> be clearer about what the problem was.
Thanks for confirming.
>
> > > That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI
> > > FADT, isn't it? Then the only values that a guest may put into those
> > > fields — 32-bit fields or not — are lower than 0xff anyway.
> >
> > It's still not clear to me why we need to make it inconsistent between the
> > EIM we declare to the guest and the KVM behavior on understanding EIM bit.
> > Even if enforced physical mode will work we loose the possibility of
> > cluster mode, and I also don't see what's the major benefit since EIM=off
> > will just work, afaiu, meanwhile make everything aligned.
>
> Yeah, I think turning EIM off is absolutely fine.
>
> > Are you fine if we proceed with this pull request first and revisit later?
> > Follow up patches will always be fine, and we're unbreaking something. I
> > have copied you since the 1st patch I posted and the small patch was there
> > for weeks, it'll be appreciated if either you could comment earlier next
> > time, or even propose a better fix then we can discuss what's the best way
> > to fix. Thanks.
>
> Yeah, sorry for the delay. But that was partly because the commit
> message was confusing me and it took me a while to work out what was
> actually going on... which is really all I'm heckling now.
I see, that was totally fine, and it'll be definitely also fine to comment
anything even on the pull req. It's just that as I tried to argue for this
specific case IMHO we should move on and revisit later so we shrink the
regression window, rather than redo a pull and let this fix wait for
another one. It seems we reached a consensus on this, thanks for that.
In all cases (irrelevant of the pull req), feel free to post any patch
either based on this one or as replacement. I'll be happy to read and
rethink. So far it still doesn't make sense to me to not enable kvm x2apic
with eim=on, but maybe I'm wrong, and I'd be happy to be corrected in that
case.
Thanks,
--
Peter Xu