RE: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode

Babu Moger posted 8 patches 5 years, 2 months ago
Only 0 patches received!
There is a newer version of this series
RE: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Posted by Babu Moger 5 years, 2 months ago

> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Thursday, August 27, 2020 4:19 PM
> To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> Moger, Babu <Babu.Moger@amd.com>; pbonzini@redhat.com;
> rth@twiddle.net
> Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> generic decode
> 
> On Wed, 26 Aug 2020 15:10:46 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Tue, 25 Aug 2020 16:25:21 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > > > On Tue, 25 Aug 2020 09:15:04 +0100 "Dr. David Alan Gilbert"
> > > > > <dgilbert@redhat.com> wrote:
> > > > >
> > > > > > * Babu Moger (babu.moger@amd.com) wrote:
> > > > > > > Hi Dave,
> > > > > > >
> > > > > > > On 8/24/20 1:41 PM, Dr. David Alan Gilbert wrote:
> > > > > > > > * Babu Moger (babu.moger@amd.com) wrote:
> > > > > > > >> To support some of the complex topology, we introduced EPYC
> mode apicid decode.
> > > > > > > >> But, EPYC mode decode is running into problems. Also it
> > > > > > > >> can become quite a maintenance problem in the future. So,
> > > > > > > >> it was decided to remove that code and use the generic
> > > > > > > >> decode which works for majority of the topology. Most of
> > > > > > > >> the SPECed configuration would work just fine. With some
> non-SPECed user inputs, it will create some sub-optimal configuration.
> > > > > > > >> Here is the discussion thread.
> > > > > > > >> https://nam11.safelinks.protection.outlook.com/?url=https
> > > > > > > >> %3A%2F%2Flore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-
> 1d84-a6e
> > > > > > > >> 7-e468-
> d5b437c1b254%40amd.com%2F&amp;data=02%7C01%7Cbabu.
> > > > > > > >>
> moger%40amd.com%7C9b15ee395daa4935640408d84acedf13%7C3dd8
> > > > > > > >>
> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637341599663177545
> > > > > > > >>
> &amp;sdata=4okYGU%2F8QTYqEOZEd1EBC%2BEsIIrEV59HZrHzpbsR8s
> > > > > > > >> U%3D&amp;reserved=0
> > > > > > > >>
> > > > > > > >> This series removes all the EPYC mode specific apicid changes
> and use the generic
> > > > > > > >> apicid decode.
> > > > > > > >
> > > > > > > > Hi Babu,
> > > > > > > >   This does simplify things a lot!
> > > > > > > > One worry, what happens about a live migration of a VM from
> an old qemu
> > > > > > > > that was using the node-id to a qemu with this new scheme?
> > > > > > >
> > > > > > > The node_id which we introduced was only used internally. This
> wasn't
> > > > > > > exposed outside. I don't think live migration will be an issue.
> > > > > >
> > > > > > Didn't it become part of the APIC ID visible to the guest?
> > > > >
> > > > > Daniel asked similar question wrt hard error on start up, when
> > > > > CLI is not sufficient to create EPYC cpu.
> > > > >
> > > > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > 2Fwww.mail-archive.com%2Fqemu-
> devel%40nongnu.org%2Fmsg728536.htm
> > > > >
> l&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C9b15ee395daa49356
> 404
> > > > >
> 08d84acedf13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63734
> 1
> > > > >
> 599663177545&amp;sdata=OnHz23W4F4TdYwlxPZwC%2B8YRY1K3qJ5U9Sfdo
> Oc
> > > > > GXtw%3D&amp;reserved=0
> > > > >
> > > > > Migration might fall into the same category.
> > > > > Also looking at the history, 5.0 commit
> > > > >   247b18c593ec29 target/i386: Enable new apic id encoding for
> > > > > EPYC based cpus models silently broke APIC ID (without versioning),
> for all EPYC models (that's were 1 new and 1 old one).
> > > > >
> > > > > (I'm not aware of somebody complaining about it)
> > > > >
> > > > > Another commit ed78467a21459, changed CPUID_8000_001E without
> versioning as well.
> > > > >
> > > > >
> > > > > With current EPYC apicid code, if all starts align (no numa or 1
> > > > > numa node only on CLI and no -smp dies=) it might produce a valid
> CPU (apicid+CPUID_8000_001E).
> > > > > No numa is gray area, since EPYC spec implies that it has to be numa
> machine in case of real EPYC cpus.
> > > > > Multi-node configs would be correct only if user assigns cpus to
> > > > > numa nodes by duplicating internal node_id algorithm that this series
> removes.
> > > > >
> > > > > There might be other broken cases that I don't recall anymore
> > > > > (should be mentioned in previous versions of this series)
> > > > >
> > > > >
> > > > > To summarize from migration pov (ignoring ed78467a21459 change):
> > > > >
> > > > >  1) old qemu pre-5.0 ==>  qemu 5.0, 5.1 - broken migration
> > > >
> > > > Oh ....
> > > >
> > > > >  2) with this series (lets call it qemu 5.2)
> > > > >      pre-5.0 ==> qemu 5.2 - should work as series basically rollbacks
> current code to pre-5.0
> > > > >      qemu 5.0, 5.1 ==> qemu 5.2 - broken
> > > > >
> > > > > It's all about picking which poison to choose, I'd preffer 2nd
> > > > > case as it lets drop a lot of complicated code that doesn't work
> > > > > as expected.
> > > >
> > > > I think that would make our lives easier for other reasons; so I'm
> > > > happy to go with that.
> > >
> > > to make things less painful for users, me wonders if there is a way
> > > to block migration if epyc and specific QEMU versions are used?
> >
> > We have no way to block based on version - and that's a pretty painful
> > thing to do; we can block based on machine type.
> >
> > But before we get there; can we understand in which combinations that
> > things break and why exactly - would it break on a 1 or 2 vCPU guest -
> > or would it only break when we get to the point the upper bits start
> > being used for example?  Why exaclty would it break - i.e. is it going
> > to change the name of sections in the migration stream - or are the
> > values we need actually going to migrate OK?
> 
> it's values of APIC ID, where 4.2 and 5.0 QEMU use different values if numa is
> enabled.
> I'd expect guest to be very confused in when this happens.
> 
> here is an example:
> qemu-4.2 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3 -numa
> node,cpus=4-7
> 
> (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id {
>     "return": 7
> }
> 
> vs
> 
> qemu-5.1 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3 -numa
> node,cpus=4-7
> (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id {
>     "return": 15
> }
> 
> we probably can't do anything based on machine type versions, as
> 4.2 and older versions on qemu-5.0 and newer use different algorithm to
> calculate apic-id.
> 
> Hence was suggestion to leave 5.0/5.1 with broken apic id and revert back to
> 4.2 algorithm, which should encode APIC ID correctly when '-smp dies' is
> used.

That is correct. When we revert all the node_id related changes, we will
go back to 4.2 algorithm. It will work fine with user passing "-smp
dies=n". It also keeps the code simple. That is why I kept the decoding of
0x8000001e like this below. This will also match apicid decoding.

*ecx = ((topo_info->dies_per_pkg - 1) << 8) |  ((cpu->apic_id >>
apicid_die_offset(topo_info)) & 0xFF);


Still not clear if we need to add a warning when numa nodes != dies.
Worried about adding that check and remove it again later.

What about auto_enable_numa? Do we still need it?

I can send the patches tomorrow if these things are clarified.
Thanks

> 
> 
> > Dave
> >
> >
> > > > > PS:
> > > > >  I didn't review it yet, but with this series we aren't  making
> > > > > up internal node_ids that should match user provided numa node ids
> somehow.
> > > > >  It seems series lost the patch that would enforce numa in case
> > > > > -smp dies>1,  but otherwise it heads in the right direction.
> > > >
> > > > Dave
> > > >
> > > > > >
> > > > > > Dave
> > > > > >
> > > > >
> > >


Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Posted by Igor Mammedov 5 years, 2 months ago
On Thu, 27 Aug 2020 17:58:01 -0500
Babu Moger <babu.moger@amd.com> wrote:

> > -----Original Message-----
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Thursday, August 27, 2020 4:19 PM
> > To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > Moger, Babu <Babu.Moger@amd.com>; pbonzini@redhat.com;
> > rth@twiddle.net
> > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> > generic decode
> > 
> > On Wed, 26 Aug 2020 15:10:46 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > On Tue, 25 Aug 2020 16:25:21 +0100
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >  
> > > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > > On Tue, 25 Aug 2020 09:15:04 +0100 "Dr. David Alan Gilbert"
> > > > > > <dgilbert@redhat.com> wrote:
> > > > > >  
> > > > > > > * Babu Moger (babu.moger@amd.com) wrote:  
> > > > > > > > Hi Dave,
> > > > > > > >
> > > > > > > > On 8/24/20 1:41 PM, Dr. David Alan Gilbert wrote:  
> > > > > > > > > * Babu Moger (babu.moger@amd.com) wrote:  
> > > > > > > > >> To support some of the complex topology, we introduced EPYC  
> > mode apicid decode.  
> > > > > > > > >> But, EPYC mode decode is running into problems. Also it
> > > > > > > > >> can become quite a maintenance problem in the future. So,
> > > > > > > > >> it was decided to remove that code and use the generic
> > > > > > > > >> decode which works for majority of the topology. Most of
> > > > > > > > >> the SPECed configuration would work just fine. With some  
> > non-SPECed user inputs, it will create some sub-optimal configuration.  
> > > > > > > > >> Here is the discussion thread.
> > > > > > > > >> https://nam11.safelinks.protection.outlook.com/?url=https
> > > > > > > > >> %3A%2F%2Flore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-  
> > 1d84-a6e  
> > > > > > > > >> 7-e468-  
> > d5b437c1b254%40amd.com%2F&amp;data=02%7C01%7Cbabu.  
> > > > > > > > >>  
> > moger%40amd.com%7C9b15ee395daa4935640408d84acedf13%7C3dd8  
> > > > > > > > >>  
> > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637341599663177545  
> > > > > > > > >>  
> > &amp;sdata=4okYGU%2F8QTYqEOZEd1EBC%2BEsIIrEV59HZrHzpbsR8s  
> > > > > > > > >> U%3D&amp;reserved=0
> > > > > > > > >>
> > > > > > > > >> This series removes all the EPYC mode specific apicid changes  
> > and use the generic  
> > > > > > > > >> apicid decode.  
> > > > > > > > >
> > > > > > > > > Hi Babu,
> > > > > > > > >   This does simplify things a lot!
> > > > > > > > > One worry, what happens about a live migration of a VM from  
> > an old qemu  
> > > > > > > > > that was using the node-id to a qemu with this new scheme?  
> > > > > > > >
> > > > > > > > The node_id which we introduced was only used internally. This  
> > wasn't  
> > > > > > > > exposed outside. I don't think live migration will be an issue.  
> > > > > > >
> > > > > > > Didn't it become part of the APIC ID visible to the guest?  
> > > > > >
> > > > > > Daniel asked similar question wrt hard error on start up, when
> > > > > > CLI is not sufficient to create EPYC cpu.
> > > > > >
> > > > > >  
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%  
> > > > > > 2Fwww.mail-archive.com%2Fqemu-  
> > devel%40nongnu.org%2Fmsg728536.htm  
> > > > > >  
> > l&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C9b15ee395daa49356
> > 404  
> > > > > >  
> > 08d84acedf13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63734
> > 1  
> > > > > >  
> > 599663177545&amp;sdata=OnHz23W4F4TdYwlxPZwC%2B8YRY1K3qJ5U9Sfdo
> > Oc  
> > > > > > GXtw%3D&amp;reserved=0
> > > > > >
> > > > > > Migration might fall into the same category.
> > > > > > Also looking at the history, 5.0 commit
> > > > > >   247b18c593ec29 target/i386: Enable new apic id encoding for
> > > > > > EPYC based cpus models silently broke APIC ID (without versioning),  
> > for all EPYC models (that's were 1 new and 1 old one).  
> > > > > >
> > > > > > (I'm not aware of somebody complaining about it)
> > > > > >
> > > > > > Another commit ed78467a21459, changed CPUID_8000_001E without  
> > versioning as well.  
> > > > > >
> > > > > >
> > > > > > With current EPYC apicid code, if all starts align (no numa or 1
> > > > > > numa node only on CLI and no -smp dies=) it might produce a valid  
> > CPU (apicid+CPUID_8000_001E).  
> > > > > > No numa is gray area, since EPYC spec implies that it has to be numa  
> > machine in case of real EPYC cpus.  
> > > > > > Multi-node configs would be correct only if user assigns cpus to
> > > > > > numa nodes by duplicating internal node_id algorithm that this series  
> > removes.  
> > > > > >
> > > > > > There might be other broken cases that I don't recall anymore
> > > > > > (should be mentioned in previous versions of this series)
> > > > > >
> > > > > >
> > > > > > To summarize from migration pov (ignoring ed78467a21459 change):
> > > > > >
> > > > > >  1) old qemu pre-5.0 ==>  qemu 5.0, 5.1 - broken migration  
> > > > >
> > > > > Oh ....
> > > > >  
> > > > > >  2) with this series (lets call it qemu 5.2)
> > > > > >      pre-5.0 ==> qemu 5.2 - should work as series basically rollbacks  
> > current code to pre-5.0  
> > > > > >      qemu 5.0, 5.1 ==> qemu 5.2 - broken
> > > > > >
> > > > > > It's all about picking which poison to choose, I'd preffer 2nd
> > > > > > case as it lets drop a lot of complicated code that doesn't work
> > > > > > as expected.  
> > > > >
> > > > > I think that would make our lives easier for other reasons; so I'm
> > > > > happy to go with that.  
> > > >
> > > > to make things less painful for users, me wonders if there is a way
> > > > to block migration if epyc and specific QEMU versions are used?  
> > >
> > > We have no way to block based on version - and that's a pretty painful
> > > thing to do; we can block based on machine type.
> > >
> > > But before we get there; can we understand in which combinations that
> > > things break and why exactly - would it break on a 1 or 2 vCPU guest -
> > > or would it only break when we get to the point the upper bits start
> > > being used for example?  Why exaclty would it break - i.e. is it going
> > > to change the name of sections in the migration stream - or are the
> > > values we need actually going to migrate OK?  
> > 
> > it's values of APIC ID, where 4.2 and 5.0 QEMU use different values if numa is
> > enabled.
> > I'd expect guest to be very confused in when this happens.
> > 
> > here is an example:
> > qemu-4.2 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3 -numa
> > node,cpus=4-7
> > 
> > (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id {
> >     "return": 7
> > }
> > 
> > vs
> > 
> > qemu-5.1 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3 -numa
> > node,cpus=4-7
> > (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id {
> >     "return": 15
> > }
> > 
> > we probably can't do anything based on machine type versions, as
> > 4.2 and older versions on qemu-5.0 and newer use different algorithm to
> > calculate apic-id.
> > 
> > Hence was suggestion to leave 5.0/5.1 with broken apic id and revert back to
> > 4.2 algorithm, which should encode APIC ID correctly when '-smp dies' is
> > used.  
> 
> That is correct. When we revert all the node_id related changes, we will
> go back to 4.2 algorithm. It will work fine with user passing "-smp
> dies=n". It also keeps the code simple. That is why I kept the decoding of
> 0x8000001e like this below. This will also match apicid decoding.
> 
> *ecx = ((topo_info->dies_per_pkg - 1) << 8) |  ((cpu->apic_id >>
> apicid_die_offset(topo_info)) & 0xFF);
that will work when there is no -numa on CLI, when -numa is used,
we should use node id that user provided.
like you did in previous revision
   "[PATCH v4 1/3] i386: Simplify CPUID_8000_001E for AMD"

> Still not clear if we need to add a warning when numa nodes != dies.
> Worried about adding that check and remove it again later.
Since there is objection wrt making it error and I'd go with warning for now,
it makes life of person who have to figure what's wrong a bit easier.

> What about auto_enable_numa? Do we still need it?
>
> 
> I can send the patches tomorrow if these things are clarified.
> Thanks
With auto_enable_numa it would be cleaner as you can reuse
the same numa code to set 0x8000001e.ecx vs hardcodding it as above.

Maybe post series without auto_enable_numa so we fix migration
regression ASAP and then switch to auto_enable_numa on top.


> 
> > 
> >   
> > > Dave
> > >
> > >  
> > > > > > PS:
> > > > > >  I didn't review it yet, but with this series we aren't  making
> > > > > > up internal node_ids that should match user provided numa node ids  
> > somehow.  
> > > > > >  It seems series lost the patch that would enforce numa in case
> > > > > > -smp dies>1,  but otherwise it heads in the right direction.  
> > > > >
> > > > > Dave
> > > > >  
> > > > > > >
> > > > > > > Dave
> > > > > > >  
> > > > > >  
> > > >  
> 


RE: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Posted by Babu Moger 5 years, 2 months ago

> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Friday, August 28, 2020 3:43 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>; ehabkost@redhat.com;
> mst@redhat.com; qemu-devel@nongnu.org; pbonzini@redhat.com;
> rth@twiddle.net
> Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> generic decode
> 
> On Thu, 27 Aug 2020 17:58:01 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > > -----Original Message-----
> > > From: Igor Mammedov <imammedo@redhat.com>
> > > Sent: Thursday, August 27, 2020 4:19 PM
> > > To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > > Moger, Babu <Babu.Moger@amd.com>; pbonzini@redhat.com;
> > > rth@twiddle.net
> > > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> > > generic decode
> > >
> > > On Wed, 26 Aug 2020 15:10:46 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > > > On Tue, 25 Aug 2020 16:25:21 +0100 "Dr. David Alan Gilbert"
> > > > > <dgilbert@redhat.com> wrote:
> > > > >
> > > > > > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > > > > > On Tue, 25 Aug 2020 09:15:04 +0100 "Dr. David Alan Gilbert"
> > > > > > > <dgilbert@redhat.com> wrote:
> > > > > > >
> > > > > > > > * Babu Moger (babu.moger@amd.com) wrote:
> > > > > > > > > Hi Dave,
> > > > > > > > >
> > > > > > > > > On 8/24/20 1:41 PM, Dr. David Alan Gilbert wrote:
> > > > > > > > > > * Babu Moger (babu.moger@amd.com) wrote:
> > > > > > > > > >> To support some of the complex topology, we
> > > > > > > > > >> introduced EPYC
> > > mode apicid decode.
> > > > > > > > > >> But, EPYC mode decode is running into problems. Also
> > > > > > > > > >> it can become quite a maintenance problem in the
> > > > > > > > > >> future. So, it was decided to remove that code and
> > > > > > > > > >> use the generic decode which works for majority of
> > > > > > > > > >> the topology. Most of the SPECed configuration would
> > > > > > > > > >> work just fine. With some
> > > non-SPECed user inputs, it will create some sub-optimal configuration.
> > > > > > > > > >> Here is the discussion thread.
> > > > > > > > > >> https://nam11.safelinks.protection.outlook.com/?url=h
> > > > > > > > > >> ttps
> > > > > > > > > >> %3A%2F%2Flore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-
> > > 1d84-a6e
> > > > > > > > > >> 7-e468-
> > > d5b437c1b254%40amd.com%2F&amp;data=02%7C01%7Cbabu.
> > > > > > > > > >>
> > > moger%40amd.com%7C9b15ee395daa4935640408d84acedf13%7C3dd8
> > > > > > > > > >>
> > > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637341599663177545
> > > > > > > > > >>
> > > &amp;sdata=4okYGU%2F8QTYqEOZEd1EBC%2BEsIIrEV59HZrHzpbsR8s
> > > > > > > > > >> U%3D&amp;reserved=0
> > > > > > > > > >>
> > > > > > > > > >> This series removes all the EPYC mode specific apicid
> > > > > > > > > >> changes
> > > and use the generic
> > > > > > > > > >> apicid decode.
> > > > > > > > > >
> > > > > > > > > > Hi Babu,
> > > > > > > > > >   This does simplify things a lot!
> > > > > > > > > > One worry, what happens about a live migration of a VM
> > > > > > > > > > from
> > > an old qemu
> > > > > > > > > > that was using the node-id to a qemu with this new scheme?
> > > > > > > > >
> > > > > > > > > The node_id which we introduced was only used
> > > > > > > > > internally. This
> > > wasn't
> > > > > > > > > exposed outside. I don't think live migration will be an issue.
> > > > > > > >
> > > > > > > > Didn't it become part of the APIC ID visible to the guest?
> > > > > > >
> > > > > > > Daniel asked similar question wrt hard error on start up,
> > > > > > > when CLI is not sufficient to create EPYC cpu.
> > > > > > >
> > > > > > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%25
> > > > > > > 2Fwww.mail-archive.com%2Fqemu-
> > > devel%40nongnu.org%2Fmsg728536.htm
> > > > > > >
> > >
> l&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C9b15ee395daa49356
> > > 404
> > > > > > >
> > >
> 08d84acedf13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63734
> > > 1
> > > > > > >
> > >
> 599663177545&amp;sdata=OnHz23W4F4TdYwlxPZwC%2B8YRY1K3qJ5U9Sfdo
> > > Oc
> > > > > > > GXtw%3D&amp;reserved=0
> > > > > > >
> > > > > > > Migration might fall into the same category.
> > > > > > > Also looking at the history, 5.0 commit
> > > > > > >   247b18c593ec29 target/i386: Enable new apic id encoding
> > > > > > > for EPYC based cpus models silently broke APIC ID (without
> > > > > > > versioning),
> > > for all EPYC models (that's were 1 new and 1 old one).
> > > > > > >
> > > > > > > (I'm not aware of somebody complaining about it)
> > > > > > >
> > > > > > > Another commit ed78467a21459, changed CPUID_8000_001E
> > > > > > > without
> > > versioning as well.
> > > > > > >
> > > > > > >
> > > > > > > With current EPYC apicid code, if all starts align (no numa
> > > > > > > or 1 numa node only on CLI and no -smp dies=) it might
> > > > > > > produce a valid
> > > CPU (apicid+CPUID_8000_001E).
> > > > > > > No numa is gray area, since EPYC spec implies that it has to
> > > > > > > be numa
> > > machine in case of real EPYC cpus.
> > > > > > > Multi-node configs would be correct only if user assigns
> > > > > > > cpus to numa nodes by duplicating internal node_id algorithm
> > > > > > > that this series
> > > removes.
> > > > > > >
> > > > > > > There might be other broken cases that I don't recall
> > > > > > > anymore (should be mentioned in previous versions of this
> > > > > > > series)
> > > > > > >
> > > > > > >
> > > > > > > To summarize from migration pov (ignoring ed78467a21459
> change):
> > > > > > >
> > > > > > >  1) old qemu pre-5.0 ==>  qemu 5.0, 5.1 - broken migration
> > > > > >
> > > > > > Oh ....
> > > > > >
> > > > > > >  2) with this series (lets call it qemu 5.2)
> > > > > > >      pre-5.0 ==> qemu 5.2 - should work as series basically
> > > > > > > rollbacks
> > > current code to pre-5.0
> > > > > > >      qemu 5.0, 5.1 ==> qemu 5.2 - broken
> > > > > > >
> > > > > > > It's all about picking which poison to choose, I'd preffer
> > > > > > > 2nd case as it lets drop a lot of complicated code that
> > > > > > > doesn't work as expected.
> > > > > >
> > > > > > I think that would make our lives easier for other reasons; so
> > > > > > I'm happy to go with that.
> > > > >
> > > > > to make things less painful for users, me wonders if there is a
> > > > > way to block migration if epyc and specific QEMU versions are used?
> > > >
> > > > We have no way to block based on version - and that's a pretty
> > > > painful thing to do; we can block based on machine type.
> > > >
> > > > But before we get there; can we understand in which combinations
> > > > that things break and why exactly - would it break on a 1 or 2
> > > > vCPU guest - or would it only break when we get to the point the
> > > > upper bits start being used for example?  Why exaclty would it
> > > > break - i.e. is it going to change the name of sections in the
> > > > migration stream - or are the values we need actually going to migrate
> OK?
> > >
> > > it's values of APIC ID, where 4.2 and 5.0 QEMU use different values
> > > if numa is enabled.
> > > I'd expect guest to be very confused in when this happens.
> > >
> > > here is an example:
> > > qemu-4.2 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3
> > > -numa
> > > node,cpus=4-7
> > >
> > > (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id
> {
> > >     "return": 7
> > > }
> > >
> > > vs
> > >
> > > qemu-5.1 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3
> > > -numa
> > > node,cpus=4-7
> > > (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id
> {
> > >     "return": 15
> > > }
> > >
> > > we probably can't do anything based on machine type versions, as
> > > 4.2 and older versions on qemu-5.0 and newer use different algorithm
> > > to calculate apic-id.
> > >
> > > Hence was suggestion to leave 5.0/5.1 with broken apic id and revert
> > > back to
> > > 4.2 algorithm, which should encode APIC ID correctly when '-smp
> > > dies' is used.
> >
> > That is correct. When we revert all the node_id related changes, we
> > will go back to 4.2 algorithm. It will work fine with user passing
> > "-smp dies=n". It also keeps the code simple. That is why I kept the
> > decoding of 0x8000001e like this below. This will also match apicid decoding.
> >
> > *ecx = ((topo_info->dies_per_pkg - 1) << 8) |  ((cpu->apic_id >>
> > apicid_die_offset(topo_info)) & 0xFF);
> that will work when there is no -numa on CLI, when -numa is used, we
> should use node id that user provided.
> like you did in previous revision
>    "[PATCH v4 1/3] i386: Simplify CPUID_8000_001E for AMD"

This might be a problem in the future with new BIOS options to change the
NPS(Nodes per Socket). Nodes and dies may not match. Then we will end up
with wrong CPUID_8000_001E encoding. That is why I wanted to keep both of
them separate. Users have the option to configure the way it matches their
bios config.


> 
> > Still not clear if we need to add a warning when numa nodes != dies.
> > Worried about adding that check and remove it again later.
> Since there is objection wrt making it error and I'd go with warning for now, it
> makes life of person who have to figure what's wrong a bit easier.
> 
> > What about auto_enable_numa? Do we still need it?
> >
> >
> > I can send the patches tomorrow if these things are clarified.
> > Thanks
> With auto_enable_numa it would be cleaner as you can reuse the same
> numa code to set 0x8000001e.ecx vs hardcodding it as above.
> 
> Maybe post series without auto_enable_numa so we fix migration
> regression ASAP and then switch to auto_enable_numa on top.
> 
> 
> >
> > >
> > >
> > > > Dave
> > > >
> > > >
> > > > > > > PS:
> > > > > > >  I didn't review it yet, but with this series we aren't
> > > > > > > making up internal node_ids that should match user provided
> > > > > > > numa node ids
> > > somehow.
> > > > > > >  It seems series lost the patch that would enforce numa in
> > > > > > > case -smp dies>1,  but otherwise it heads in the right direction.
> > > > > >
> > > > > > Dave
> > > > > >
> > > > > > > >
> > > > > > > > Dave
> > > > > > > >
> > > > > > >
> > > > >
> >