[PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops

Zhongqiu Han posted 1 patch 12 months ago
drivers/tty/mips_ejtag_fdc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Zhongqiu Han 12 months ago
It is considered good practice to call cpu_relax() in busy loops, see
Documentation/process/volatile-considered-harmful.rst. This can lower CPU
power consumption or yield to a hyperthreaded twin processor, or serve as
a compiler barrier. In addition, if something goes wrong in the busy loop
at least it can prevent things from getting worse.

Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
---
 drivers/tty/mips_ejtag_fdc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
index afbf7738c7c4..b17ead1e9698 100644
--- a/drivers/tty/mips_ejtag_fdc.c
+++ b/drivers/tty/mips_ejtag_fdc.c
@@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct console *c, const char *s,
 
 		/* Busy wait until there's space in fifo */
 		while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
-			;
+			cpu_relax();
 		__raw_writel(word.word, regs + REG_FDTX(c->index));
 	}
 out:
@@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
 
 	/* Busy wait until there's space in fifo */
 	while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
-		;
+		cpu_relax();
 	__raw_writel(word.word,
 		     regs + REG_FDTX(CONFIG_MIPS_EJTAG_FDC_KGDB_CHAN));
 }
-- 
2.25.1
Re: [PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Jiri Slaby 12 months ago
On 19. 12. 24, 13:42, Zhongqiu Han wrote:
> It is considered good practice to call cpu_relax() in busy loops, see
> Documentation/process/volatile-considered-harmful.rst. This can lower CPU
> power consumption or yield to a hyperthreaded twin processor, or serve as
> a compiler barrier. In addition, if something goes wrong in the busy loop
> at least it can prevent things from getting worse.
> 
> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
> ---
>   drivers/tty/mips_ejtag_fdc.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
> index afbf7738c7c4..b17ead1e9698 100644
> --- a/drivers/tty/mips_ejtag_fdc.c
> +++ b/drivers/tty/mips_ejtag_fdc.c
> @@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct console *c, const char *s,
>   
>   		/* Busy wait until there's space in fifo */
>   		while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
> -			;
> +			cpu_relax();
>   		__raw_writel(word.word, regs + REG_FDTX(c->index));
>   	}
>   out:
> @@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
>   
>   	/* Busy wait until there's space in fifo */
>   	while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
> -		;
> +		cpu_relax();

Can this instead be switched to read_poll_timeout_atomic() or alike? 
Those already contain cpu_relax(), of course...

thanks,
-- 
js
suse labs
Re: [PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Zhongqiu Han 12 months ago
On 12/20/2024 3:16 PM, Jiri Slaby wrote:
> On 19. 12. 24, 13:42, Zhongqiu Han wrote:
>> It is considered good practice to call cpu_relax() in busy loops, see
>> Documentation/process/volatile-considered-harmful.rst. This can lower CPU
>> power consumption or yield to a hyperthreaded twin processor, or serve as
>> a compiler barrier. In addition, if something goes wrong in the busy loop
>> at least it can prevent things from getting worse.
>>
>> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
>> ---
>>   drivers/tty/mips_ejtag_fdc.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
>> index afbf7738c7c4..b17ead1e9698 100644
>> --- a/drivers/tty/mips_ejtag_fdc.c
>> +++ b/drivers/tty/mips_ejtag_fdc.c
>> @@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct 
>> console *c, const char *s,
>>           /* Busy wait until there's space in fifo */
>>           while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
>> -            ;
>> +            cpu_relax();
>>           __raw_writel(word.word, regs + REG_FDTX(c->index));
>>       }
>>   out:
>> @@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
>>       /* Busy wait until there's space in fifo */
>>       while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
>> -        ;
>> +        cpu_relax();
> 
> Can this instead be switched to read_poll_timeout_atomic() or alike? 
> Those already contain cpu_relax(), of course...
> 
> thanks,

Hi Jiri,
Thanks a lot for the review!

yeah, maybe we can consider read_poll_timeout_atomic() or alike.

The implementation of read_poll_timeout_atomic() provides a precise
customization of the address busy read poll behavior by calling udelay()
and cpu_relax(), and using a timeout threshold. However, in this case,
it seems we might not need to customize the poll behavior. Since
udelay() only consumes CPU cycles, perhaps cpu_relax() is sufficient?
And if it times out, we still need to keep retrying until the data is
read. My initial thought was to call cpu_relax() to save power or act as
a memory barrier. As I mentioned before in my email to Greg, certain
MIPS-based architectures, such as Loongson-3, should requirecpu_relax().
Thanks~




-- 
Thx and BRs,
Zhongqiu Han
Re: [PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Greg KH 12 months ago
On Thu, Dec 19, 2024 at 08:42:54PM +0800, Zhongqiu Han wrote:
> It is considered good practice to call cpu_relax() in busy loops, see
> Documentation/process/volatile-considered-harmful.rst. This can lower CPU
> power consumption or yield to a hyperthreaded twin processor, or serve as
> a compiler barrier. In addition, if something goes wrong in the busy loop
> at least it can prevent things from getting worse.
> 
> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
> ---
>  drivers/tty/mips_ejtag_fdc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
> index afbf7738c7c4..b17ead1e9698 100644
> --- a/drivers/tty/mips_ejtag_fdc.c
> +++ b/drivers/tty/mips_ejtag_fdc.c
> @@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct console *c, const char *s,
>  
>  		/* Busy wait until there's space in fifo */
>  		while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
> -			;
> +			cpu_relax();
>  		__raw_writel(word.word, regs + REG_FDTX(c->index));
>  	}
>  out:
> @@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
>  
>  	/* Busy wait until there's space in fifo */
>  	while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
> -		;
> +		cpu_relax();

How did you test this?  Are you _sure_ it is needed at all?  I think you
just made these loops take a lot longer than before :(

Have you had problems with these tight loops doing anything bad to your
system?

thanks,

greg k-h
Re: [PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Zhongqiu Han 12 months ago
On 12/19/2024 9:28 PM, Greg KH wrote:
> On Thu, Dec 19, 2024 at 08:42:54PM +0800, Zhongqiu Han wrote:
>> It is considered good practice to call cpu_relax() in busy loops, see
>> Documentation/process/volatile-considered-harmful.rst. This can lower CPU
>> power consumption or yield to a hyperthreaded twin processor, or serve as
>> a compiler barrier. In addition, if something goes wrong in the busy loop
>> at least it can prevent things from getting worse.
>>
>> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
>> ---
>>   drivers/tty/mips_ejtag_fdc.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
>> index afbf7738c7c4..b17ead1e9698 100644
>> --- a/drivers/tty/mips_ejtag_fdc.c
>> +++ b/drivers/tty/mips_ejtag_fdc.c
>> @@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct console *c, const char *s,
>>   
>>   		/* Busy wait until there's space in fifo */
>>   		while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
>> -			;
>> +			cpu_relax();
>>   		__raw_writel(word.word, regs + REG_FDTX(c->index));
>>   	}
>>   out:
>> @@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
>>   
>>   	/* Busy wait until there's space in fifo */
>>   	while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
>> -		;
>> +		cpu_relax();
> 
> How did you test this?  Are you _sure_ it is needed at all?  I think you
> just made these loops take a lot longer than before :(
> 
> Have you had problems with these tight loops doing anything bad to your
> system?
> 
> thanks,
> 
> greg k-h

Hi Greg,
Thanks a lot for the review~

Perhaps I should submit an RFC patch and explain the situation, as I
don't have a MIPS device for testing. Indeed, the cpu_relax()
implementation for MIPS is a memory barrier, which, compared to busy
waiting, doesn't save power and can make loops slower than before.
However, according to its definition file, for certain MIPS-based
architectures like Loongarch-3, it can help force the Loongson-3 SFB
(Store-Fill-Buffer) flush to avoid pending writes. Below is the
implementation of cpu_relax() for the MIPS architecture and its
comments.

-----------------------------------------------------------------
arch/mips/include/asm/vdso/processor.h

#ifdef CONFIG_CPU_LOONGSON64
/*
  * Loongson-3's SFB (Store-Fill-Buffer) may buffer writes indefinitely
  * when a tight read loop is executed, because reads take priority over
  * writes & the hardware (incorrectly) doesn't ensure that writes will
  * eventually occur.
  *
  * Since spin loops of any kind should have a cpu_relax() in them, force
  * an SFB flush from cpu_relax() such that any pending writes will
  * become visible as expected.
  */
#define cpu_relax()	smp_mb()
#else
#define cpu_relax()	barrier()
#endif
----------------------------------------------------------------

Based on this, cpu_relax() should be needed here? :)

Thank you~


-- 
Thx and BRs,
Zhongqiu Han
Re: [PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Greg KH 12 months ago
On Fri, Dec 20, 2024 at 09:46:27PM +0800, Zhongqiu Han wrote:
> On 12/19/2024 9:28 PM, Greg KH wrote:
> > On Thu, Dec 19, 2024 at 08:42:54PM +0800, Zhongqiu Han wrote:
> > > It is considered good practice to call cpu_relax() in busy loops, see
> > > Documentation/process/volatile-considered-harmful.rst. This can lower CPU
> > > power consumption or yield to a hyperthreaded twin processor, or serve as
> > > a compiler barrier. In addition, if something goes wrong in the busy loop
> > > at least it can prevent things from getting worse.
> > > 
> > > Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
> > > ---
> > >   drivers/tty/mips_ejtag_fdc.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
> > > index afbf7738c7c4..b17ead1e9698 100644
> > > --- a/drivers/tty/mips_ejtag_fdc.c
> > > +++ b/drivers/tty/mips_ejtag_fdc.c
> > > @@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct console *c, const char *s,
> > >   		/* Busy wait until there's space in fifo */
> > >   		while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
> > > -			;
> > > +			cpu_relax();
> > >   		__raw_writel(word.word, regs + REG_FDTX(c->index));
> > >   	}
> > >   out:
> > > @@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
> > >   	/* Busy wait until there's space in fifo */
> > >   	while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
> > > -		;
> > > +		cpu_relax();
> > 
> > How did you test this?  Are you _sure_ it is needed at all?  I think you
> > just made these loops take a lot longer than before :(
> > 
> > Have you had problems with these tight loops doing anything bad to your
> > system?
> > 
> > thanks,
> > 
> > greg k-h
> 
> Hi Greg,
> Thanks a lot for the review~
> 
> Perhaps I should submit an RFC patch and explain the situation, as I
> don't have a MIPS device for testing. Indeed, the cpu_relax()
> implementation for MIPS is a memory barrier, which, compared to busy
> waiting, doesn't save power and can make loops slower than before.
> However, according to its definition file, for certain MIPS-based
> architectures like Loongarch-3, it can help force the Loongson-3 SFB
> (Store-Fill-Buffer) flush to avoid pending writes. Below is the
> implementation of cpu_relax() for the MIPS architecture and its
> comments.
> 
> -----------------------------------------------------------------
> arch/mips/include/asm/vdso/processor.h
> 
> #ifdef CONFIG_CPU_LOONGSON64
> /*
>  * Loongson-3's SFB (Store-Fill-Buffer) may buffer writes indefinitely
>  * when a tight read loop is executed, because reads take priority over
>  * writes & the hardware (incorrectly) doesn't ensure that writes will
>  * eventually occur.
>  *
>  * Since spin loops of any kind should have a cpu_relax() in them, force
>  * an SFB flush from cpu_relax() such that any pending writes will
>  * become visible as expected.
>  */
> #define cpu_relax()	smp_mb()
> #else
> #define cpu_relax()	barrier()
> #endif
> ----------------------------------------------------------------
> 
> Based on this, cpu_relax() should be needed here? :)

I don't know, please test and let us know!

Without testing of this on real hardware, we can't take this change for
obvious reasons.

thanks,

greg k-h
Re: [PATCH] tty: mips_ejtag_fdc: Call cpu_relax() in registers polling busy loops
Posted by Zhongqiu Han 11 months, 3 weeks ago
On 12/20/2024 11:06 PM, Greg KH wrote:
> On Fri, Dec 20, 2024 at 09:46:27PM +0800, Zhongqiu Han wrote:
>> On 12/19/2024 9:28 PM, Greg KH wrote:
>>> On Thu, Dec 19, 2024 at 08:42:54PM +0800, Zhongqiu Han wrote:
>>>> It is considered good practice to call cpu_relax() in busy loops, see
>>>> Documentation/process/volatile-considered-harmful.rst. This can lower CPU
>>>> power consumption or yield to a hyperthreaded twin processor, or serve as
>>>> a compiler barrier. In addition, if something goes wrong in the busy loop
>>>> at least it can prevent things from getting worse.
>>>>
>>>> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
>>>> ---
>>>>    drivers/tty/mips_ejtag_fdc.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
>>>> index afbf7738c7c4..b17ead1e9698 100644
>>>> --- a/drivers/tty/mips_ejtag_fdc.c
>>>> +++ b/drivers/tty/mips_ejtag_fdc.c
>>>> @@ -346,7 +346,7 @@ static void mips_ejtag_fdc_console_write(struct console *c, const char *s,
>>>>    		/* Busy wait until there's space in fifo */
>>>>    		while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
>>>> -			;
>>>> +			cpu_relax();
>>>>    		__raw_writel(word.word, regs + REG_FDTX(c->index));
>>>>    	}
>>>>    out:
>>>> @@ -1233,7 +1233,7 @@ static void kgdbfdc_push_one(void)
>>>>    	/* Busy wait until there's space in fifo */
>>>>    	while (__raw_readl(regs + REG_FDSTAT) & REG_FDSTAT_TXF)
>>>> -		;
>>>> +		cpu_relax();
>>>
>>> How did you test this?  Are you _sure_ it is needed at all?  I think you
>>> just made these loops take a lot longer than before :(
>>>
>>> Have you had problems with these tight loops doing anything bad to your
>>> system?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Hi Greg,
>> Thanks a lot for the review~
>>
>> Perhaps I should submit an RFC patch and explain the situation, as I
>> don't have a MIPS device for testing. Indeed, the cpu_relax()
>> implementation for MIPS is a memory barrier, which, compared to busy
>> waiting, doesn't save power and can make loops slower than before.
>> However, according to its definition file, for certain MIPS-based
>> architectures like Loongarch-3, it can help force the Loongson-3 SFB
>> (Store-Fill-Buffer) flush to avoid pending writes. Below is the
>> implementation of cpu_relax() for the MIPS architecture and its
>> comments.
>>
>> -----------------------------------------------------------------
>> arch/mips/include/asm/vdso/processor.h
>>
>> #ifdef CONFIG_CPU_LOONGSON64
>> /*
>>   * Loongson-3's SFB (Store-Fill-Buffer) may buffer writes indefinitely
>>   * when a tight read loop is executed, because reads take priority over
>>   * writes & the hardware (incorrectly) doesn't ensure that writes will
>>   * eventually occur.
>>   *
>>   * Since spin loops of any kind should have a cpu_relax() in them, force
>>   * an SFB flush from cpu_relax() such that any pending writes will
>>   * become visible as expected.
>>   */
>> #define cpu_relax()	smp_mb()
>> #else
>> #define cpu_relax()	barrier()
>> #endif
>> ----------------------------------------------------------------
>>
>> Based on this, cpu_relax() should be needed here? :)
> 
> I don't know, please test and let us know!
> 
> Without testing of this on real hardware, we can't take this change for
> obvious reasons.

Hi Greg,
Sorry for the delay reply.

Sure, I will conduct comparative testing if I have the mips device in
the future, or other developers are also welcome to participate in the
testing.

> 
> thanks,
> 
> greg k-h


-- 
Thx and BRs,
Zhongqiu Han