[PATCH] tracing: Mitigate possible softlockup in __tracing_open()

Zheng Yejian posted 1 patch 1 year, 3 months ago
kernel/trace/trace.c | 9 +++++++++
1 file changed, 9 insertions(+)
[PATCH] tracing: Mitigate possible softlockup in __tracing_open()
Posted by Zheng Yejian 1 year, 3 months ago
In __tracing_open(), when max latency tracers took place on the cpu,
the time start of its buffer would be updated, then event entries with
timestamps being earlier than start of the buffer would be skipped
(see tracing_iter_reset()).

Softlockup will occur if the kernel is non-preemptible and too many
entries were skipped in the loop that reset every cpu buffer, so add
cond_resched() to avoid it.

Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
---
 kernel/trace/trace.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ebe7ce2f5f4a..88faa95b457b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot)
 		for_each_tracing_cpu(cpu) {
 			ring_buffer_read_start(iter->buffer_iter[cpu]);
 			tracing_iter_reset(iter, cpu);
+			/*
+			 * When max latency tracers took place on the cpu, the time start
+			 * of its buffer would be updated, then event entries with timestamps
+			 * being earlier than start of the buffer would be skipped
+			 * (see tracing_iter_reset()). Softlockup will occur if the kernel
+			 * is non-preemptible and too many entries were skipped in the loop,
+			 * so add cond_resched() to mitigate it.
+			 */
+			cond_resched();
 		}
 	} else {
 		cpu = iter->cpu_file;
-- 
2.25.1
Re: [PATCH] tracing: Mitigate possible softlockup in __tracing_open()
Posted by Steven Rostedt 1 year, 3 months ago
On Sat, 24 Aug 2024 11:03:43 +0800
Zheng Yejian <zhengyejian@huaweicloud.com> wrote:

> In __tracing_open(), when max latency tracers took place on the cpu,
> the time start of its buffer would be updated, then event entries with
> timestamps being earlier than start of the buffer would be skipped
> (see tracing_iter_reset()).
> 
> Softlockup will occur if the kernel is non-preemptible and too many
> entries were skipped in the loop that reset every cpu buffer, so add
> cond_resched() to avoid it.
> 
> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
> ---
>  kernel/trace/trace.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index ebe7ce2f5f4a..88faa95b457b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot)
>  		for_each_tracing_cpu(cpu) {
>  			ring_buffer_read_start(iter->buffer_iter[cpu]);
>  			tracing_iter_reset(iter, cpu);
> +			/*
> +			 * When max latency tracers took place on the cpu, the time start
> +			 * of its buffer would be updated, then event entries with timestamps
> +			 * being earlier than start of the buffer would be skipped
> +			 * (see tracing_iter_reset()). Softlockup will occur if the kernel
> +			 * is non-preemptible and too many entries were skipped in the loop,
> +			 * so add cond_resched() to mitigate it.
> +			 */
> +			cond_resched();

This is the wrong place to put this. If the problem is with
tracing_iter_reset(), then add it there.

	while (ring_buffer_iter_peek(buf_iter, &ts)) {
		if (ts >= iter->array_buffer->time_start)
			break;
		entries++;
		ring_buffer_iter_advance(buf_iter);
		/* This could be a big loop */
		cond_resched();
	}

-- Steve



>  		}
>  	} else {
>  		cpu = iter->cpu_file;
Re: [PATCH] tracing: Mitigate possible softlockup in __tracing_open()
Posted by Zheng Yejian 1 year, 3 months ago
On 2024/8/26 22:35, Steven Rostedt wrote:
> On Sat, 24 Aug 2024 11:03:43 +0800
> Zheng Yejian <zhengyejian@huaweicloud.com> wrote:
> 
>> In __tracing_open(), when max latency tracers took place on the cpu,
>> the time start of its buffer would be updated, then event entries with
>> timestamps being earlier than start of the buffer would be skipped
>> (see tracing_iter_reset()).
>>
>> Softlockup will occur if the kernel is non-preemptible and too many
>> entries were skipped in the loop that reset every cpu buffer, so add
>> cond_resched() to avoid it.
>>
>> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
>> ---
>>   kernel/trace/trace.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index ebe7ce2f5f4a..88faa95b457b 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot)
>>   		for_each_tracing_cpu(cpu) {
>>   			ring_buffer_read_start(iter->buffer_iter[cpu]);
>>   			tracing_iter_reset(iter, cpu);
>> +			/*
>> +			 * When max latency tracers took place on the cpu, the time start
>> +			 * of its buffer would be updated, then event entries with timestamps
>> +			 * being earlier than start of the buffer would be skipped
>> +			 * (see tracing_iter_reset()). Softlockup will occur if the kernel
>> +			 * is non-preemptible and too many entries were skipped in the loop,
>> +			 * so add cond_resched() to mitigate it.
>> +			 */
>> +			cond_resched();
> 
> This is the wrong place to put this. If the problem is with
> tracing_iter_reset(), then add it there.
> 
> 	while (ring_buffer_iter_peek(buf_iter, &ts)) {
> 		if (ts >= iter->array_buffer->time_start)
> 			break;
> 		entries++;
> 		ring_buffer_iter_advance(buf_iter);
> 		/* This could be a big loop */
> 		cond_resched();

Do resched check every event peek, I'm afraid there will
be too many 're-schedule'.

> 	}
> 
> -- Steve
> 
> 
> 
>>   		}
>>   	} else {
>>   		cpu = iter->cpu_file;
> 

-- 
Thanks,
Zheng Yejian
Re: [PATCH] tracing: Mitigate possible softlockup in __tracing_open()
Posted by Steven Rostedt 1 year, 3 months ago
On Tue, 27 Aug 2024 09:29:55 +0800
Zheng Yejian <zhengyejian@huaweicloud.com> wrote:

 
> > This is the wrong place to put this. If the problem is with
> > tracing_iter_reset(), then add it there.
> > 
> > 	while (ring_buffer_iter_peek(buf_iter, &ts)) {
> > 		if (ts >= iter->array_buffer->time_start)
> > 			break;
> > 		entries++;
> > 		ring_buffer_iter_advance(buf_iter);
> > 		/* This could be a big loop */
> > 		cond_resched();  
> 
> Do resched check every event peek, I'm afraid there will
> be too many 're-schedule'.

cond_resched() as the name suggests only schedules conditionally if
NEED_RESCHED is set. This isn't a fast path (obviously, as it requires
a cond_resched).

-- Steve
[PATCH v2] tracing: Avoid possible softlockup in tracing_iter_reset()
Posted by Zheng Yejian 1 year, 3 months ago
In __tracing_open(), when max latency tracers took place on the cpu,
the time start of its buffer would be updated, then event entries with
timestamps being earlier than start of the buffer would be skipped
(see tracing_iter_reset()).

Softlockup will occur if the kernel is non-preemptible and too many
entries were skipped in the loop that reset every cpu buffer, so add
cond_resched() to avoid it.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
---
 kernel/trace/trace.c | 2 ++
 1 file changed, 2 insertions(+)

v2:
  - Change to add cond_resched() in tracing_iter_reset()
    Link: https://lore.kernel.org/all/20240826103522.390faa85@gandalf.local.home/
  - Update commit title and add suggested-by tag

v1: https://lore.kernel.org/all/20240824030343.3218618-1-zhengyejian@huaweicloud.com/

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ebe7ce2f5f4a..edf6bc817aa1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3958,6 +3958,8 @@ void tracing_iter_reset(struct trace_iterator *iter, int cpu)
 			break;
 		entries++;
 		ring_buffer_iter_advance(buf_iter);
+		/* This could be a big loop */
+		cond_resched();
 	}
 
 	per_cpu_ptr(iter->array_buffer->data, cpu)->skipped_entries = entries;
-- 
2.25.1
Re: [PATCH] tracing: Mitigate possible softlockup in __tracing_open()
Posted by Masami Hiramatsu (Google) 1 year, 3 months ago
On Sat, 24 Aug 2024 11:03:43 +0800
Zheng Yejian <zhengyejian@huaweicloud.com> wrote:

> In __tracing_open(), when max latency tracers took place on the cpu,
> the time start of its buffer would be updated, then event entries with
> timestamps being earlier than start of the buffer would be skipped
> (see tracing_iter_reset()).
> 
> Softlockup will occur if the kernel is non-preemptible and too many
> entries were skipped in the loop that reset every cpu buffer, so add
> cond_resched() to avoid it.
> 
> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
> ---
>  kernel/trace/trace.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index ebe7ce2f5f4a..88faa95b457b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot)
>  		for_each_tracing_cpu(cpu) {
>  			ring_buffer_read_start(iter->buffer_iter[cpu]);
>  			tracing_iter_reset(iter, cpu);
> +			/*
> +			 * When max latency tracers took place on the cpu, the time start
> +			 * of its buffer would be updated, then event entries with timestamps
> +			 * being earlier than start of the buffer would be skipped
> +			 * (see tracing_iter_reset()). Softlockup will occur if the kernel
> +			 * is non-preemptible and too many entries were skipped in the loop,
> +			 * so add cond_resched() to mitigate it.
> +			 */
> +			cond_resched();

OK, but why we cond_resched() only here? Another tracing_iter_reset() in
s_start() does not cause the soft lockups in the same situation?

Thank you,


>  		}
>  	} else {
>  		cpu = iter->cpu_file;
> -- 
> 2.25.1
> 
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Re: [PATCH] tracing: Mitigate possible softlockup in __tracing_open()
Posted by Zheng Yejian 1 year, 3 months ago
On 2024/8/25 23:05, Masami Hiramatsu (Google) wrote:
> On Sat, 24 Aug 2024 11:03:43 +0800
> Zheng Yejian <zhengyejian@huaweicloud.com> wrote:
> 
>> In __tracing_open(), when max latency tracers took place on the cpu,
>> the time start of its buffer would be updated, then event entries with
>> timestamps being earlier than start of the buffer would be skipped
>> (see tracing_iter_reset()).
>>
>> Softlockup will occur if the kernel is non-preemptible and too many
>> entries were skipped in the loop that reset every cpu buffer, so add
>> cond_resched() to avoid it.
>>
>> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
>> ---
>>   kernel/trace/trace.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index ebe7ce2f5f4a..88faa95b457b 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot)
>>   		for_each_tracing_cpu(cpu) {
>>   			ring_buffer_read_start(iter->buffer_iter[cpu]);
>>   			tracing_iter_reset(iter, cpu);
>> +			/*
>> +			 * When max latency tracers took place on the cpu, the time start
>> +			 * of its buffer would be updated, then event entries with timestamps
>> +			 * being earlier than start of the buffer would be skipped
>> +			 * (see tracing_iter_reset()). Softlockup will occur if the kernel
>> +			 * is non-preemptible and too many entries were skipped in the loop,
>> +			 * so add cond_resched() to mitigate it.
>> +			 */
>> +			cond_resched();
> 
> OK, but why we cond_resched() only here? Another tracing_iter_reset() in
> s_start() does not cause the soft lockups in the same situation?
> 

Sorry for the late reply.

__tracing_open() is called when trace file is being opened,
s_start() is called when trace file is being read. Normally,
we read immediately after open at which tracing_iter_reset() was
just called, softlockup may be not that easily triggered in
s_start(), do we need cond_resched() there?



> Thank you,
> 
> 
>>   		}
>>   	} else {
>>   		cpu = iter->cpu_file;
>> -- 
>> 2.25.1
>>
>>
> 
> 

-- 
Thanks,
Zheng Yejian