cputlb.c | 8 +++++++- target/ppc/translate.c | 29 ++++++++++++++++++++++++++--- 2 files changed, 33 insertions(+), 4 deletions(-)
The series enables Multi-Threaded TCG on PPC64 Patch 01: Use atomic_cmpxchg in store conditional 02: Handle first write to page during atomic operation 03: Generate memory barriers for sync/isync and load/store conditional Patches are based on ppc-for-2.10 Tested using following: ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img Todo: * Enable other machine types and PPC32. * More testing for corner cases. Nikunj A Dadhania (3): target/ppc: Emulate LL/SC using cmpxchg helpers cputlb: handle first atomic write to the page target/ppc: Generate fence operations cputlb.c | 8 +++++++- target/ppc/translate.c | 29 ++++++++++++++++++++++++++--- 2 files changed, 33 insertions(+), 4 deletions(-) -- 2.9.3
Hello Nikunj, On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: > The series enables Multi-Threaded TCG on PPC64 > > Patch 01: Use atomic_cmpxchg in store conditional > 02: Handle first write to page during atomic operation > 03: Generate memory barriers for sync/isync and load/store conditional > > Patches are based on ppc-for-2.10 > > Tested using following: > ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked good : the CPU usage of QEMU reached 760% on the host. > Todo: > * Enable other machine types and PPC32. I am quite ignorant on the topic. Have you looked at what it would take to emulate support of the HW threads ? and the PowerNV machine ? Thanks, C. > * More testing for corner cases. > > Nikunj A Dadhania (3): > target/ppc: Emulate LL/SC using cmpxchg helpers > cputlb: handle first atomic write to the page > target/ppc: Generate fence operations > > cputlb.c | 8 +++++++- > target/ppc/translate.c | 29 ++++++++++++++++++++++++++--- > 2 files changed, 33 insertions(+), 4 deletions(-) >
On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote: > Hello Nikunj, > > On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >> The series enables Multi-Threaded TCG on PPC64 >> >> Patch 01: Use atomic_cmpxchg in store conditional >> 02: Handle first write to page during atomic operation >> 03: Generate memory barriers for sync/isync and load/store >> conditional >> >> Patches are based on ppc-for-2.10 >> >> Tested using following: >> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic >> -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel >> tcg,thread=multi f23.img > > I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked > good : the CPU usage of QEMU reached 760% on the host. What was your guest operating system?
On 04/06/2017 03:28 PM, G 3 wrote: > > On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote: > >> Hello Nikunj, >> >> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>> The series enables Multi-Threaded TCG on PPC64 >>> >>> Patch 01: Use atomic_cmpxchg in store conditional >>> 02: Handle first write to page during atomic operation >>> 03: Generate memory barriers for sync/isync and load/store conditional >>> >>> Patches are based on ppc-for-2.10 >>> >>> Tested using following: >>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >> >> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >> good : the CPU usage of QEMU reached 760% on the host. > > What was your guest operating system? The guest is an Ubuntu 16.04.2 and the host is an Ubuntu 17.04. C.
On Apr 6, 2017, at 9:32 AM, Cédric Le Goater wrote: > On 04/06/2017 03:28 PM, G 3 wrote: >> >> On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote: >> >>> Hello Nikunj, >>> >>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>> The series enables Multi-Threaded TCG on PPC64 >>>> >>>> Patch 01: Use atomic_cmpxchg in store conditional >>>> 02: Handle first write to page during atomic operation >>>> 03: Generate memory barriers for sync/isync and load/store >>>> conditional >>>> >>>> Patches are based on ppc-for-2.10 >>>> >>>> Tested using following: >>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none - >>>> nographic -machine pseries,usb=off -m 2G -smp >>>> 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>> >>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It >>> looked >>> good : the CPU usage of QEMU reached 760% on the host. >> >> What was your guest operating system? > > The guest is an Ubuntu 16.04.2 and the host is an Ubuntu 17.04. > > C. Thank you for the information. What you could do is run QEMU in emulation mode (non-kvm mode) and time how long it takes Ubuntu to boot up with one emulated core vs how long it takes to boot up on say 4 emulated cores. This would be a good start: Boot up times: one core: two cores: four cores: eight cores:
Hi i can help test it too on my two Be machine. If some one help me to find where is the patch or where i can download the commits Thanks Luigi
On Apr 6, 2017, at 1:08 PM, luigi burdo wrote: > > Hi i can help test it too on my two Be machine. > If some one help me to find where is the patch or where i can > download the commits > > Thanks > Luigi Here are the patches: 1/3 https://patchwork.ozlabs.org/patch/747691/ 2/3 https://patchwork.ozlabs.org/patch/747692/ 3/3 https://patchwork.ozlabs.org/patch/747688/
Cédric Le Goater <clg@kaod.org> writes: > Hello Nikunj, > > On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >> The series enables Multi-Threaded TCG on PPC64 >> >> Patch 01: Use atomic_cmpxchg in store conditional >> 02: Handle first write to page during atomic operation >> 03: Generate memory barriers for sync/isync and load/store conditional >> >> Patches are based on ppc-for-2.10 >> >> Tested using following: >> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img > > I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked > good : the CPU usage of QEMU reached 760% on the host. Cool. >> Todo: >> * Enable other machine types and PPC32. > > I am quite ignorant on the topic. > Have you looked at what it would take to emulate support of the HW > threads ? We would need to implement msgsndp (doorbell support for IPI between threads of same core) > and the PowerNV machine ? Haven't tried it, should work. Just give a shot, let me know if you see problems. Regards Nikunj
On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > >> Hello Nikunj, >> >> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>> The series enables Multi-Threaded TCG on PPC64 >>> >>> Patch 01: Use atomic_cmpxchg in store conditional >>> 02: Handle first write to page during atomic operation >>> 03: Generate memory barriers for sync/isync and load/store conditional >>> >>> Patches are based on ppc-for-2.10 >>> >>> Tested using following: >>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >> >> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >> good : the CPU usage of QEMU reached 760% on the host. > > Cool. > >>> Todo: >>> * Enable other machine types and PPC32. >> >> I am quite ignorant on the topic. >> Have you looked at what it would take to emulate support of the HW >> threads ? > > We would need to implement msgsndp (doorbell support for IPI between > threads of same core) ok. I get it. Thanks, >> and the PowerNV machine ? > > Haven't tried it, should work. Just give a shot, let me know if you see problems. sure. pnv is still on 2.9, so I will rebase on 2.10, merge your patches and tell you. Thanks, C.
On 04/07/2017 08:07 AM, Cédric Le Goater wrote: > On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >> Cédric Le Goater <clg@kaod.org> writes: >> >>> Hello Nikunj, >>> >>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>> The series enables Multi-Threaded TCG on PPC64 >>>> >>>> Patch 01: Use atomic_cmpxchg in store conditional >>>> 02: Handle first write to page during atomic operation >>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>> >>>> Patches are based on ppc-for-2.10 >>>> >>>> Tested using following: >>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>> >>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>> good : the CPU usage of QEMU reached 760% on the host. >> >> Cool. >> >>>> Todo: >>>> * Enable other machine types and PPC32. >>> >>> I am quite ignorant on the topic. >>> Have you looked at what it would take to emulate support of the HW >>> threads ? >> >> We would need to implement msgsndp (doorbell support for IPI between >> threads of same core) > > ok. I get it. Thanks, > >>> and the PowerNV machine ? >> >> Haven't tried it, should work. Just give a shot, let me know if you see problems. > > sure. pnv is still on 2.9, so I will rebase on 2.10, merge your > patches and tell you. The system seems to be spinning in skiboot in cpu_idle/relax when starting the linux kernel. It finally boots, but it is rather long. David has merged enough to test if you want to give it a try. Cheers, C.
Cédric Le Goater <clg@kaod.org> writes: > On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>> Cédric Le Goater <clg@kaod.org> writes: >>> >>>> Hello Nikunj, >>>> >>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>> The series enables Multi-Threaded TCG on PPC64 >>>>> >>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>> 02: Handle first write to page during atomic operation >>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>> >>>>> Patches are based on ppc-for-2.10 >>>>> >>>>> Tested using following: >>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>> >>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>> good : the CPU usage of QEMU reached 760% on the host. >>> >>> Cool. >>> >>>>> Todo: >>>>> * Enable other machine types and PPC32. >>>> >>>> I am quite ignorant on the topic. >>>> Have you looked at what it would take to emulate support of the HW >>>> threads ? >>> >>> We would need to implement msgsndp (doorbell support for IPI between >>> threads of same core) >> >> ok. I get it. Thanks, >> >>>> and the PowerNV machine ? >>> >>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >> >> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >> patches and tell you. > > The system seems to be spinning in skiboot in cpu_idle/relax when > starting the linux kernel. It finally boots, but it is rather long. > David has merged enough to test if you want to give it a try. I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing them. I too saw delay during boot, but wasn't aware that its caused by mttcg. I will have a look. Regards Nikunj
On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > >> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>> Cédric Le Goater <clg@kaod.org> writes: >>>> >>>>> Hello Nikunj, >>>>> >>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>>> The series enables Multi-Threaded TCG on PPC64 >>>>>> >>>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>>> 02: Handle first write to page during atomic operation >>>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>>> >>>>>> Patches are based on ppc-for-2.10 >>>>>> >>>>>> Tested using following: >>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>>> >>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>>> good : the CPU usage of QEMU reached 760% on the host. >>>> >>>> Cool. >>>> >>>>>> Todo: >>>>>> * Enable other machine types and PPC32. >>>>> >>>>> I am quite ignorant on the topic. >>>>> Have you looked at what it would take to emulate support of the HW >>>>> threads ? >>>> >>>> We would need to implement msgsndp (doorbell support for IPI between >>>> threads of same core) >>> >>> ok. I get it. Thanks, >>> >>>>> and the PowerNV machine ? >>>> >>>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >>> >>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>> patches and tell you. >> >> The system seems to be spinning in skiboot in cpu_idle/relax when >> starting the linux kernel. It finally boots, but it is rather long. >> David has merged enough to test if you want to give it a try. > > I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing > them. I too saw delay during boot, but wasn't aware that its caused by > mttcg. I will have a look. You can use David's branch directly now, there is enough support. I am not sure where that is exactly, the kernel is somewhere in early_setup(). It might be the secondary spinloop. thanks, C.
Cédric Le Goater <clg@kaod.org> writes: > On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote: >> Cédric Le Goater <clg@kaod.org> writes: >> >>> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>>> Cédric Le Goater <clg@kaod.org> writes: >>>> >>>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>>> patches and tell you. >>> >>> The system seems to be spinning in skiboot in cpu_idle/relax when >>> starting the linux kernel. It finally boots, but it is rather long. >>> David has merged enough to test if you want to give it a try. >> >> I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing >> them. I too saw delay during boot, but wasn't aware that its caused by >> mttcg. I will have a look. > > You can use David's branch directly now, there is enough support. Sure > I am not sure where that is exactly, the kernel is somewhere in > early_setup(). It might be the secondary spinloop. Lot of prints missing, i think I need to add a console. [ 2.303286014,5] INIT: Starting kernel at 0x20010000, fdt at 0x30354908 14865 bytes) [ 43.421998779,5] OPAL: Switch to little-endian OS -> smp_release_cpus() spinning_secondaries = 3 <- smp_release_cpus() [ 0.260526] nvram: Failed to find or create lnx,oops-log partition, err -28 [ 0.264448] nvram: Failed to initialize oops partition! Regards, Nikunj
Cédric Le Goater <clg@kaod.org> writes: > On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>> Cédric Le Goater <clg@kaod.org> writes: >>> >>>> Hello Nikunj, >>>> >>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>> The series enables Multi-Threaded TCG on PPC64 >>>>> >>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>> 02: Handle first write to page during atomic operation >>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>> >>>>> Patches are based on ppc-for-2.10 >>>>> >>>>> Tested using following: >>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>> >>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>> good : the CPU usage of QEMU reached 760% on the host. >>> >>> Cool. >>> >>>>> Todo: >>>>> * Enable other machine types and PPC32. >>>> >>>> I am quite ignorant on the topic. >>>> Have you looked at what it would take to emulate support of the HW >>>> threads ? >>> >>> We would need to implement msgsndp (doorbell support for IPI between >>> threads of same core) >> >> ok. I get it. Thanks, >> >>>> and the PowerNV machine ? >>> >>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >> >> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >> patches and tell you. > > The system seems to be spinning in skiboot in cpu_idle/relax when > starting the linux kernel. It finally boots, but it is rather long. > David has merged enough to test if you want to give it a try. Does PPC have Wait-for-irq or similar "sleeping" instructions? We had to ensure we were not jumping out of the cpu loop and suspend normally. See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. -- Alex Bennée
On 04/10/2017 07:20 PM, Alex Bennée wrote: > > Cédric Le Goater <clg@kaod.org> writes: > >> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>> Cédric Le Goater <clg@kaod.org> writes: >>>> >>>>> Hello Nikunj, >>>>> >>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>>> The series enables Multi-Threaded TCG on PPC64 >>>>>> >>>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>>> 02: Handle first write to page during atomic operation >>>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>>> >>>>>> Patches are based on ppc-for-2.10 >>>>>> >>>>>> Tested using following: >>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>>> >>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>>> good : the CPU usage of QEMU reached 760% on the host. >>>> >>>> Cool. >>>> >>>>>> Todo: >>>>>> * Enable other machine types and PPC32. >>>>> >>>>> I am quite ignorant on the topic. >>>>> Have you looked at what it would take to emulate support of the HW >>>>> threads ? >>>> >>>> We would need to implement msgsndp (doorbell support for IPI between >>>> threads of same core) >>> >>> ok. I get it. Thanks, >>> >>>>> and the PowerNV machine ? >>>> >>>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >>> >>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>> patches and tell you. >> >> The system seems to be spinning in skiboot in cpu_idle/relax when >> starting the linux kernel. It finally boots, but it is rather long. >> David has merged enough to test if you want to give it a try. > > Does PPC have Wait-for-irq or similar "sleeping" instructions? > > We had to ensure we were not jumping out of the cpu loop and suspend > normally. I really don't know. Ben, now that we have mttcg activated by default on ppc, it takes a while for the linux kernel to do the early setup. I think we are in the code section where we spin loop the secondaries. Most of the time is spent in skiboot under cpu_idle/relax. Any idea where that could come from ? > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. Thanks for the hint. C.
On Tue, 2017-04-11 at 14:28 +0200, Cédric Le Goater wrote: > I really don't know. > > Ben, now that we have mttcg activated by default on ppc, it takes > a while for the linux kernel to do the early setup. I think we are > in the code section where we spin loop the secondaries. Most of the > time is spent in skiboot under cpu_idle/relax. > > Any idea where that could come from ? > > > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. > > Thanks for the hint. They are spinning, but they have smt_low instructions in the loop, maybe that causes us to do some kind of synchronization as we exit the emulation loop on these ? I added that to force relinguish time to other threads on the pre-MT TCG... There isn't really such a "pause" instruction. At least not yet.... P9 has a wait that is meant to wait for special AS_Notify cycles but will also wait for interrupts. We don't have an mwait at this point. Ben.
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Tue, 2017-04-11 at 14:28 +0200, Cédric Le Goater wrote: >> I really don't know. >> >> Ben, now that we have mttcg activated by default on ppc, it takes >> a while for the linux kernel to do the early setup. I think we are >> in the code section where we spin loop the secondaries. Most of the >> time is spent in skiboot under cpu_idle/relax. >> >> Any idea where that could come from ? >> >> > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. >> >> Thanks for the hint. > > They are spinning, but they have smt_low instructions in the loop, > maybe that causes us to do some kind of synchronization as we exit > the emulation loop on these ? I added that to force relinguish time > to other threads on the pre-MT TCG... Yeah you need a tweak the approach when running with MTTCG as otherwise you end up causing exits for one vCPUs loop to yield to vCPUs that are already running in other threads. > There isn't really such a "pause" instruction. At least not yet.... P9 > has a wait that is meant to wait for special AS_Notify cycles but will > also wait for interrupts. We don't have an mwait at this point. They are worth implementing. FWIW on ARM we only really handle WFI (Wait-for-interrupt) which will cause the EXCP_HALT and that will put the vCPU into a halted state which can be woken up next interrupt. For the other cases YIELD and WFE (wait-for-event) we just treat them as NOPs when MTTCG is enabled (test parallel_cpus). So they will busy-wait spin around the guests wfe code but don't trigger expensive longjmps out of the execution loop. This was all done in: c22edfebff target-arm: don't generate WFE/YIELD calls for MTTCG One other thing I noticed while looking through the PPC stuff is I couldn't see any handling of cpu_reset/powering on. There is a potential race here which ThreadSanitizer will complain about if you start loading up your about-to-be-powered-on-vCPU from another thread. The fix here is to defer the setup with async work. See: 062ba099e0 target-arm/powerctl: defer cpu reset work to CPU context -- Alex Bennée
© 2016 - 2024 Red Hat, Inc.