[PATCH v2 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb

Alexandre Ghiti posted 4 patches 2 years, 6 months ago
There is a newer version of this series
[PATCH v2 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Posted by Alexandre Ghiti 2 years, 6 months ago
Currently, when the range to flush covers more than one page (a 4K page or
a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
tlb comes with a greater cost than flushing a single entry so we should
flush single entries up to a certain threshold so that:
threshold * cost of flushing a single entry < cost of flushing the whole
tlb.

This threshold is microarchitecture dependent and can/should be
overwritten by vendors.

Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/mm/tlbflush.c | 41 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 3e4acef1f6bc..8017d2130e27 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -24,13 +24,48 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
 			: "memory");
 }
 
+/*
+ * Flush entire TLB if number of entries to be flushed is greater
+ * than the threshold below. Platforms may override the threshold
+ * value based on marchid, mvendorid, and mimpid.
+ */
+static unsigned long tlb_flush_all_threshold __read_mostly = 64;
+
+static void local_flush_tlb_range_threshold_asid(unsigned long start,
+						 unsigned long size,
+						 unsigned long stride,
+						 unsigned long asid)
+{
+	u16 nr_ptes_in_range = DIV_ROUND_UP(size, stride);
+	int i;
+
+	if (nr_ptes_in_range > tlb_flush_all_threshold) {
+		if (asid != -1)
+			local_flush_tlb_all_asid(asid);
+		else
+			local_flush_tlb_all();
+		return;
+	}
+
+	for (i = 0; i < nr_ptes_in_range; ++i) {
+		if (asid != -1)
+			local_flush_tlb_page_asid(start, asid);
+		else
+			local_flush_tlb_page(start);
+		start += stride;
+	}
+}
+
 static inline void local_flush_tlb_range(unsigned long start,
 		unsigned long size, unsigned long stride)
 {
 	if (size <= stride)
 		local_flush_tlb_page(start);
-	else
+	else if (size == (unsigned long)-1)
 		local_flush_tlb_all();
+	else
+		local_flush_tlb_range_threshold_asid(start, size, stride, -1);
+
 }
 
 static inline void local_flush_tlb_range_asid(unsigned long start,
@@ -38,8 +73,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start,
 {
 	if (size <= stride)
 		local_flush_tlb_page_asid(start, asid);
-	else
+	else if (size == (unsigned long)-1)
 		local_flush_tlb_all_asid(asid);
+	else
+		local_flush_tlb_range_threshold_asid(start, size, stride, asid);
 }
 
 static void __ipi_flush_tlb_all(void *info)
-- 
2.39.2
Re: [PATCH v2 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Posted by Conor Dooley 2 years, 6 months ago
On Thu, Jul 27, 2023 at 08:55:52PM +0200, Alexandre Ghiti wrote:
> Currently, when the range to flush covers more than one page (a 4K page or
> a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
> tlb comes with a greater cost than flushing a single entry so we should
> flush single entries up to a certain threshold so that:
> threshold * cost of flushing a single entry < cost of flushing the whole
> tlb.
> 

> This threshold is microarchitecture dependent and can/should be
> overwritten by vendors.

Please remove the latter part of this, as there is no infrastructure for
this at present, nor likely in the immediate future.

> Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/mm/tlbflush.c | 41 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 3e4acef1f6bc..8017d2130e27 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -24,13 +24,48 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
>  			: "memory");
>  }
>  
> +/*
> + * Flush entire TLB if number of entries to be flushed is greater
> + * than the threshold below.

>     Platforms may override the threshold
> + * value based on marchid, mvendorid, and mimpid.

And this too, as there is no infrastructure for this the comment is
misleading. This kind of thing should only be added when there is
actually a mechanism for doing so.

I did say I would think about how to do this, but I have not come up
with something. I dislike using the marchid/mvendorid/mimpid stuff if we
can avoid it, as there's no control over what actually gets put in there,
especially if people are going to use the open souce cores.

Do we even, unless under extreme duress, want to allow setting custom
values here via firmware? Sounds like a recipe for 1200 different
alternatives or a big LUT...
Re: [PATCH v2 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Posted by Andrew Jones 2 years, 6 months ago
On Thu, Jul 27, 2023 at 08:55:52PM +0200, Alexandre Ghiti wrote:
> Currently, when the range to flush covers more than one page (a 4K page or
> a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
> tlb comes with a greater cost than flushing a single entry so we should
> flush single entries up to a certain threshold so that:
> threshold * cost of flushing a single entry < cost of flushing the whole
> tlb.
> 
> This threshold is microarchitecture dependent and can/should be
> overwritten by vendors.
> 
> Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/mm/tlbflush.c | 41 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 3e4acef1f6bc..8017d2130e27 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -24,13 +24,48 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
>  			: "memory");
>  }
>  
> +/*
> + * Flush entire TLB if number of entries to be flushed is greater
> + * than the threshold below. Platforms may override the threshold
> + * value based on marchid, mvendorid, and mimpid.
> + */
> +static unsigned long tlb_flush_all_threshold __read_mostly = 64;
> +
> +static void local_flush_tlb_range_threshold_asid(unsigned long start,
> +						 unsigned long size,
> +						 unsigned long stride,
> +						 unsigned long asid)
> +{
> +	u16 nr_ptes_in_range = DIV_ROUND_UP(size, stride);
> +	int i;
> +
> +	if (nr_ptes_in_range > tlb_flush_all_threshold) {
> +		if (asid != -1)
> +			local_flush_tlb_all_asid(asid);
> +		else
> +			local_flush_tlb_all();
> +		return;
> +	}
> +
> +	for (i = 0; i < nr_ptes_in_range; ++i) {
> +		if (asid != -1)
> +			local_flush_tlb_page_asid(start, asid);
> +		else
> +			local_flush_tlb_page(start);
> +		start += stride;
> +	}
> +}
> +
>  static inline void local_flush_tlb_range(unsigned long start,
>  		unsigned long size, unsigned long stride)
>  {
>  	if (size <= stride)
>  		local_flush_tlb_page(start);
> -	else
> +	else if (size == (unsigned long)-1)

The more we scatter this -1 around, especially now that we also need to
cast it, the more I think we should introduce a #define for it.

>  		local_flush_tlb_all();
> +	else
> +		local_flush_tlb_range_threshold_asid(start, size, stride, -1);
> +
>  }
>  
>  static inline void local_flush_tlb_range_asid(unsigned long start,
> @@ -38,8 +73,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start,
>  {
>  	if (size <= stride)
>  		local_flush_tlb_page_asid(start, asid);
> -	else
> +	else if (size == (unsigned long)-1)
>  		local_flush_tlb_all_asid(asid);
> +	else
> +		local_flush_tlb_range_threshold_asid(start, size, stride, asid);
>  }
>  
>  static void __ipi_flush_tlb_all(void *info)
> -- 
> 2.39.2
>

Otherwise,

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew
Re: [PATCH v2 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Posted by Conor Dooley 2 years, 6 months ago
On Fri, Jul 28, 2023 at 03:32:35PM +0200, Andrew Jones wrote:
> On Thu, Jul 27, 2023 at 08:55:52PM +0200, Alexandre Ghiti wrote:

> > +	else if (size == (unsigned long)-1)
> 
> The more we scatter this -1 around, especially now that we also need to
> cast it, the more I think we should introduce a #define for it.

Please.