powerpc/tlb: enable arch want batched unmap tlb flush

[RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Luming Yu 1 year, 4 months ago

From: Yu Luming <luming.yu@gmail.com>

ppc always do its own tracking for batch tlb. By trivially enabling
the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
common code in rmap and reduce overhead and do optimization it could not
have without a tlb flushing context at low architecture level.

Signed-off-by: Luming Yu <luming.yu@shingroup.cn>
---
 arch/powerpc/Kconfig                |  1 +
 arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)
 create mode 100644 arch/powerpc/include/asm/tlbbatch.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e94e7e4bfd40..e6db84dd014a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -175,6 +175,7 @@ config PPC
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
 	select ARCH_WANT_LD_ORPHAN_WARN
+	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 	select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP	if PPC_RADIX_MMU
 	select ARCH_WANTS_MODULES_DATA_IN_VMALLOC	if PPC_BOOK3S_32 || PPC_8xx
 	select ARCH_WEAK_RELEASE_ACQUIRE
diff --git a/arch/powerpc/include/asm/tlbbatch.h b/arch/powerpc/include/asm/tlbbatch.h
new file mode 100644
index 000000000000..484628460057
--- /dev/null
+++ b/arch/powerpc/include/asm/tlbbatch.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ARCH_PPC_TLBBATCH_H
+#define _ARCH_PPC_TLBBATCH_H
+
+struct arch_tlbflush_unmap_batch {
+	/*
+         *
+	 */
+};
+
+static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
+{
+}
+
+static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
+						struct mm_struct *mm,
+						unsigned long uarddr)
+{
+}
+
+static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
+{
+	/*ppc always do tlb flush in batch*/
+	return false;
+}
+
+static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
+{
+}
+#endif /* _ARCH_PPC_TLBBATCH_H */
-- 
2.42.0.windows.2

Re: [RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Ritesh Harjani (IBM) 1 year, 4 months ago

Luming Yu <luming.yu@shingroup.cn> writes:

> From: Yu Luming <luming.yu@gmail.com>
>
> ppc always do its own tracking for batch tlb. By trivially enabling
> the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
> common code in rmap and reduce overhead and do optimization it could not
> have without a tlb flushing context at low architecture level.

I looked at this patch and other than the compile failure, this patch
still won't optimize anything. The idea of this config is that we want
to batch all the tlb flush operation at the end. By returning false from
should_defer_flush() (in this patch), we are saying we cannot defer
the flush and hence we do tlb flush in the same context of unmap.

Anyway, I took a quick look at ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
and I have a quick PoC for the same. I will soon post it.

-ritesh

>
> Signed-off-by: Luming Yu <luming.yu@shingroup.cn>
> ---
>  arch/powerpc/Kconfig                |  1 +
>  arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/tlbbatch.h
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e94e7e4bfd40..e6db84dd014a 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -175,6 +175,7 @@ config PPC
>  	select ARCH_WANT_IPC_PARSE_VERSION
>  	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>  	select ARCH_WANT_LD_ORPHAN_WARN
> +	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
>  	select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP	if PPC_RADIX_MMU
>  	select ARCH_WANTS_MODULES_DATA_IN_VMALLOC	if PPC_BOOK3S_32 || PPC_8xx
>  	select ARCH_WEAK_RELEASE_ACQUIRE
> diff --git a/arch/powerpc/include/asm/tlbbatch.h b/arch/powerpc/include/asm/tlbbatch.h
> new file mode 100644
> index 000000000000..484628460057
> --- /dev/null
> +++ b/arch/powerpc/include/asm/tlbbatch.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ARCH_PPC_TLBBATCH_H
> +#define _ARCH_PPC_TLBBATCH_H
> +
> +struct arch_tlbflush_unmap_batch {
> +	/*
> +         *
> +	 */
> +};
> +
> +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> +{
> +}
> +
> +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> +						struct mm_struct *mm,
> +						unsigned long uarddr)
> +{
> +}
> +
> +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
> +{
> +	/*ppc always do tlb flush in batch*/
> +	return false;
> +}
> +
> +static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
> +{
> +}
> +#endif /* _ARCH_PPC_TLBBATCH_H */
> -- 
> 2.42.0.windows.2

Re: [RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Luming Yu 1 year, 4 months ago

On Sun, Sep 22, 2024 at 04:39:53PM +0530, Ritesh Harjani wrote:
> Luming Yu <luming.yu@shingroup.cn> writes:
> 
> > From: Yu Luming <luming.yu@gmail.com>
> >
> > ppc always do its own tracking for batch tlb. By trivially enabling
> > the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
> > common code in rmap and reduce overhead and do optimization it could not
> > have without a tlb flushing context at low architecture level.
> 
> I looked at this patch and other than the compile failure, this patch
> still won't optimize anything. The idea of this config is that we want
> to batch all the tlb flush operation at the end. By returning false from
> should_defer_flush() (in this patch), we are saying we cannot defer
> the flush and hence we do tlb flush in the same context of unmap.
not exactly, as false return implies, we currently do nothing but relying on
book3S_64's tlb batch implementation which contains a bit of defer optimization
that we need to use a real benchmark to do some performance characterization.

And I need to get my test bed ready for patch testing first. So I have to
defer the real optimization in this area.
> 
> Anyway, I took a quick look at ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> and I have a quick PoC for the same. I will soon post it.
thanks for picking up the barton for the future collaboration on the
potential common performance benefits among us for powerpc arch.
> 
> -ritesh
> 
> >
> > Signed-off-by: Luming Yu <luming.yu@shingroup.cn>
> > ---
> >  arch/powerpc/Kconfig                |  1 +
> >  arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++++++++
> >  2 files changed, 31 insertions(+)
> >  create mode 100644 arch/powerpc/include/asm/tlbbatch.h
> >
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index e94e7e4bfd40..e6db84dd014a 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -175,6 +175,7 @@ config PPC
> >  	select ARCH_WANT_IPC_PARSE_VERSION
> >  	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
> >  	select ARCH_WANT_LD_ORPHAN_WARN
> > +	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> >  	select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP	if PPC_RADIX_MMU
> >  	select ARCH_WANTS_MODULES_DATA_IN_VMALLOC	if PPC_BOOK3S_32 || PPC_8xx
> >  	select ARCH_WEAK_RELEASE_ACQUIRE
> > diff --git a/arch/powerpc/include/asm/tlbbatch.h b/arch/powerpc/include/asm/tlbbatch.h
> > new file mode 100644
> > index 000000000000..484628460057
> > --- /dev/null
> > +++ b/arch/powerpc/include/asm/tlbbatch.h
> > @@ -0,0 +1,30 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ARCH_PPC_TLBBATCH_H
> > +#define _ARCH_PPC_TLBBATCH_H
> > +
> > +struct arch_tlbflush_unmap_batch {
> > +	/*
> > +         *
> > +	 */
> > +};
> > +
> > +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> > +{
> > +}
> > +
> > +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> > +						struct mm_struct *mm,
> > +						unsigned long uarddr)
> > +{
> > +}
> > +
> > +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
> > +{
> > +	/*ppc always do tlb flush in batch*/
> > +	return false;
> > +}
> > +
> > +static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
> > +{
> > +}
> > +#endif /* _ARCH_PPC_TLBBATCH_H */
> > -- 
> > 2.42.0.windows.2
>

Re: [RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Ritesh Harjani (IBM) 1 year, 4 months ago

Luming Yu <luming.yu@shingroup.cn> writes:

> On Sun, Sep 22, 2024 at 04:39:53PM +0530, Ritesh Harjani wrote:
>> Luming Yu <luming.yu@shingroup.cn> writes:
>> 
>> > From: Yu Luming <luming.yu@gmail.com>
>> >
>> > ppc always do its own tracking for batch tlb. By trivially enabling
>> > the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
>> > common code in rmap and reduce overhead and do optimization it could not
>> > have without a tlb flushing context at low architecture level.
>> 
>> I looked at this patch and other than the compile failure, this patch
>> still won't optimize anything. The idea of this config is that we want
>> to batch all the tlb flush operation at the end. By returning false from
>> should_defer_flush() (in this patch), we are saying we cannot defer
>> the flush and hence we do tlb flush in the same context of unmap.
> not exactly, as false return implies, we currently do nothing but relying on
> book3S_64's tlb batch implementation which contains a bit of defer optimization
> that we need to use a real benchmark to do some performance characterization.
>
> And I need to get my test bed ready for patch testing first. So I have to
> defer the real optimization in this area.
>> 
>> Anyway, I took a quick look at ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
>> and I have a quick PoC for the same. I will soon post it.
> thanks for picking up the barton for the future collaboration on the
> potential common performance benefits among us for powerpc arch.

Sure Thanks, Luming. 
I have posted this work here [1].

[1]: https://lore.kernel.org/linuxppc-dev/cover.1727001426.git.ritesh.list@gmail.com/
-ritesh

Re: [RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Michael Ellerman 1 year, 4 months ago

Luming Yu <luming.yu@shingroup.cn> writes:
> From: Yu Luming <luming.yu@gmail.com>
>
> ppc always do its own tracking for batch tlb.

I don't think it does? :)

I think you're referring to the batch handling in 
arch/powerpc/include/asm/book3s/64/tlbflush-hash.h ?

But that's only used for 64-bit Book3S with the HPT MMU.

> By trivially enabling
> the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
> common code in rmap and reduce overhead and do optimization it could not
> have without a tlb flushing context at low architecture level.
>
> Signed-off-by: Luming Yu <luming.yu@shingroup.cn>
> ---
>  arch/powerpc/Kconfig                |  1 +
>  arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/tlbbatch.h

This doesn't build:

  https://github.com/linuxppc/linux-snowpatch/actions/runs/10919442655

Can you please follow the instructions here:

  https://github.com/linuxppc/wiki/wiki/Testing-with-GitHub-Actions

Which describe how to fork our CI tree that has Github Actions
preconfigured, then you can apply your patches on top and push to github
and it will do some test builds for you. Notably it will do 32-bit
builds which is what broke here.

cheers

Re: [RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Luming Yu 1 year, 4 months ago

On Thu, Sep 19, 2024 at 01:22:21PM +1000, Michael Ellerman wrote:
> Luming Yu <luming.yu@shingroup.cn> writes:
> > From: Yu Luming <luming.yu@gmail.com>
> >
> > ppc always do its own tracking for batch tlb.
> 
> I don't think it does? :)
> 
> I think you're referring to the batch handling in 
> arch/powerpc/include/asm/book3s/64/tlbflush-hash.h ?
> 
> But that's only used for 64-bit Book3S with the HPT MMU.
> 
> > By trivially enabling
> > the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
> > common code in rmap and reduce overhead and do optimization it could not
> > have without a tlb flushing context at low architecture level.
> >
> > Signed-off-by: Luming Yu <luming.yu@shingroup.cn>
> > ---
> >  arch/powerpc/Kconfig                |  1 +
> >  arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++++++++
> >  2 files changed, 31 insertions(+)
> >  create mode 100644 arch/powerpc/include/asm/tlbbatch.h
> 
> This doesn't build:
> 
>   https://github.com/linuxppc/linux-snowpatch/actions/runs/10919442655
> 
> Can you please follow the instructions here:
> 
>   https://github.com/linuxppc/wiki/wiki/Testing-with-GitHub-Actions
> 
> Which describe how to fork our CI tree that has Github Actions
> preconfigured, then you can apply your patches on top and push to github
> and it will do some test builds for you. Notably it will do 32-bit
> builds which is what broke here.
thanks, I will take a look and do this for next patch before posting on mailing list. :-)
Ideally it should also include qemu boot tests for targets that must work.
I think we could also need a powerpc yocto recipe as well to make patch test more customizable
and reproducible than fedora/Debian distro. I've been searching for it for a while, but I couldn't find a useful one. Maybe I need to come up one of my own to facilitate the ci test bot ideas.
> 
> cheers
>

Re: [RFC PATCH] powerpc/tlb: enable arch want batched unmap tlb flush

Posted by Michael Ellerman 1 year, 4 months ago

Luming Yu <luming.yu@shingroup.cn> writes:
> On Thu, Sep 19, 2024 at 01:22:21PM +1000, Michael Ellerman wrote:
>> Luming Yu <luming.yu@shingroup.cn> writes:
>> > From: Yu Luming <luming.yu@gmail.com>
>> >
>> > ppc always do its own tracking for batch tlb.
>> 
>> I don't think it does? :)
>> 
>> I think you're referring to the batch handling in 
>> arch/powerpc/include/asm/book3s/64/tlbflush-hash.h ?
>> 
>> But that's only used for 64-bit Book3S with the HPT MMU.
>> 
>> > By trivially enabling
>> > the ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH in ppc, ppc arch can re-use
>> > common code in rmap and reduce overhead and do optimization it could not
>> > have without a tlb flushing context at low architecture level.
>> >
>> > Signed-off-by: Luming Yu <luming.yu@shingroup.cn>
>> > ---
>> >  arch/powerpc/Kconfig                |  1 +
>> >  arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++++++++
>> >  2 files changed, 31 insertions(+)
>> >  create mode 100644 arch/powerpc/include/asm/tlbbatch.h
>> 
>> This doesn't build:
>> 
>>   https://github.com/linuxppc/linux-snowpatch/actions/runs/10919442655
>> 
>> Can you please follow the instructions here:
>> 
>>   https://github.com/linuxppc/wiki/wiki/Testing-with-GitHub-Actions
>> 
>> Which describe how to fork our CI tree that has Github Actions
>> preconfigured, then you can apply your patches on top and push to github
>> and it will do some test builds for you. Notably it will do 32-bit
>> builds which is what broke here.

> thanks, I will take a look and do this for next patch before posting on mailing list. :-)
> 
> Ideally it should also include qemu boot tests for targets that must work.
 
Those scripts do qemu boots of pseries p8/p9, powernv p8/p9, 44x,
e5500, g5, and mac99.

It doesn't boot full distros because that's too slow for Github Actions,
so it doesn't catch all bugs, but it's better than nothing.

> I think we could also need a powerpc yocto recipe as well to make
> patch test more customizable
> and reproducible than fedora/Debian distro. I've been searching for it
> for a while, but I couldn't find a useful one. Maybe I need to come up
> one of my own to facilitate the ci test bot ideas.

I've never used Yocto, not sure if it does/did support powerpc.

Buildroot can build powerpc images with lots of packages included.

cheers