[PATCH] perf top: expand the range of multithreaded phase

Hangliang Lai posted 1 patch 2 years, 11 months ago
There is a newer version of this series
tools/perf/builtin-top.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
[PATCH] perf top: expand the range of multithreaded phase
Posted by Hangliang Lai 2 years, 11 months ago
In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
donw_read and down_write can work to handle concurrency problems. Then top 
use perf_set_singlethreaded and switch to single threaded phase, assuming 
that no thread concurrency will happen later. 

However, a UAF problem could occur in perf top in single threaded phase, 
The concurrent procedure is like this:

display_thread                              process_thread
--------------                              --------------

thread__comm_len
  -> thread__comm_str
    -> __thread__comm_str(thread)
                                            thread__delete
                                             -> comm__free
                                              -> comm_str__put
                                               -> zfree(&cs->str)
    -> thread->comm_len = strlen(comm);

Since in single thread phase, perf_singlethreaded is true, down_read and 
down_write can not work to avoid concurrency problems.

This patch put perf_set_singlethreaded to the function tail to expand the 
multithreaded phase range, make display_thread and process_thread run
safe. 

Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
Reported-by: Wenyu Liu <liuwenyu7@huawei.com>
Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
---
 tools/perf/builtin-top.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 7c6413447..74239940b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1280,9 +1280,6 @@ static int __cmd_top(struct perf_top *top)
 				    top->evlist->core.threads, false,
 				    top->nr_threads_synthesize);
 
-	if (top->nr_threads_synthesize > 1)
-		perf_set_singlethreaded();
-
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
 		if (ret < 0) {
@@ -1359,6 +1356,10 @@ out_join:
 out_join_thread:
 	pthread_cond_signal(&top->qe.cond);
 	pthread_join(thread_process, NULL);
+
+	if (top->nr_threads_synthesize > 1)
+		perf_set_singlethreaded();
+
 	return ret;
 }
 
-- 
2.33.0
Re: [PATCH] perf top: expand the range of multithreaded phase
Posted by Namhyung Kim 2 years, 10 months ago
Hello,

Sorry for the late reply.

On Fri, Mar 17, 2023 at 3:05 AM Hangliang Lai <laihangliang1@huawei.com> wrote:
>
> In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
> donw_read and down_write can work to handle concurrency problems. Then top
> use perf_set_singlethreaded and switch to single threaded phase, assuming
> that no thread concurrency will happen later.
>
> However, a UAF problem could occur in perf top in single threaded phase,
> The concurrent procedure is like this:
>
> display_thread                              process_thread
> --------------                              --------------
>
> thread__comm_len
>   -> thread__comm_str
>     -> __thread__comm_str(thread)
>                                             thread__delete
>                                              -> comm__free
>                                               -> comm_str__put
>                                                -> zfree(&cs->str)
>     -> thread->comm_len = strlen(comm);
>
> Since in single thread phase, perf_singlethreaded is true, down_read and
> down_write can not work to avoid concurrency problems.
>
> This patch put perf_set_singlethreaded to the function tail to expand the
> multithreaded phase range, make display_thread and process_thread run
> safe.

I think it should be unconditional as perf top is always multi-threaded.

>
> Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
> Reported-by: Wenyu Liu <liuwenyu7@huawei.com>
> Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
> ---
>  tools/perf/builtin-top.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 7c6413447..74239940b 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1280,9 +1280,6 @@ static int __cmd_top(struct perf_top *top)
>                                     top->evlist->core.threads, false,
>                                     top->nr_threads_synthesize);
>
> -       if (top->nr_threads_synthesize > 1)
> -               perf_set_singlethreaded();

Instead, we can simply do

    perf_set_multithreaded();

If top->nr_threads_synthesize > 1, no effect.  If not, it turns
the switch on here.

> -
>         if (perf_hpp_list.socket) {
>                 ret = perf_env__read_cpu_topology_map(&perf_env);
>                 if (ret < 0) {
> @@ -1359,6 +1356,10 @@ out_join:
>  out_join_thread:
>         pthread_cond_signal(&top->qe.cond);
>         pthread_join(thread_process, NULL);
> +
> +       if (top->nr_threads_synthesize > 1)
> +               perf_set_singlethreaded();

And remove the condition here.

Thanks,
Namhyung


> +
>         return ret;
>  }
>
> --
> 2.33.0
>
[PATCH v2] perf top: expand the range of multithreaded phase
Posted by Hangliang Lai 2 years, 10 months ago
In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
donw_read and down_write can work to handle concurrency problems. Then top
use perf_set_singlethreaded and switch to single threaded phase, assuming
that no thread concurrency will happen later.
However, a UAF problem could occur in perf top in single threaded phase,
The concurrent procedure is like this:
display_thread                              process_thread
--------------                              --------------
thread__comm_len
  -> thread__comm_str
    -> __thread__comm_str(thread)
                                            thread__delete
                                             -> comm__free
                                              -> comm_str__put
                                               -> zfree(&cs->str)
    -> thread->comm_len = strlen(comm);
Since in single thread phase, perf_singlethreaded is true, down_read and
down_write can not work to avoid concurrency problems.
This patch put perf_set_singlethreaded to the function tail to expand the
multithreaded phase range, make display_thread and process_thread run
safe.

Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
---
v1 -> v2
 - Since perf top is always multi-threaded, remove top->nr_threads_synthesize judgment.

 tools/perf/builtin-top.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d4b5b02bab73..a18db1ee87fa 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1242,8 +1242,7 @@ static int __cmd_top(struct perf_top *top)
 	if (perf_session__register_idle_thread(top->session) < 0)
 		return ret;
 
-	if (top->nr_threads_synthesize > 1)
-		perf_set_multithreaded();
+	perf_set_multithreaded();
 
 	init_process_thread(top);
 
@@ -1273,9 +1272,6 @@ static int __cmd_top(struct perf_top *top)
 				    top->evlist->core.threads, true, false,
 				    top->nr_threads_synthesize);
 
-	if (top->nr_threads_synthesize > 1)
-		perf_set_singlethreaded();
-
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
 		if (ret < 0) {
@@ -1352,6 +1348,9 @@ static int __cmd_top(struct perf_top *top)
 out_join_thread:
 	cond_signal(&top->qe.cond);
 	pthread_join(thread_process, NULL);
+
+	perf_set_singlethreaded();
+
 	return ret;
 }
 
-- 
2.33.0
Re: [PATCH v2] perf top: expand the range of multithreaded phase
Posted by Namhyung Kim 2 years, 10 months ago
Hello,

On Wed, Apr 5, 2023 at 7:54 PM Hangliang Lai <laihangliang1@huawei.com> wrote:
>
> In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
> donw_read and down_write can work to handle concurrency problems. Then top
> use perf_set_singlethreaded and switch to single threaded phase, assuming
> that no thread concurrency will happen later.
> However, a UAF problem could occur in perf top in single threaded phase,
> The concurrent procedure is like this:
> display_thread                              process_thread
> --------------                              --------------
> thread__comm_len
>   -> thread__comm_str
>     -> __thread__comm_str(thread)
>                                             thread__delete
>                                              -> comm__free
>                                               -> comm_str__put
>                                                -> zfree(&cs->str)
>     -> thread->comm_len = strlen(comm);
> Since in single thread phase, perf_singlethreaded is true, down_read and
> down_write can not work to avoid concurrency problems.
> This patch put perf_set_singlethreaded to the function tail to expand the
> multithreaded phase range, make display_thread and process_thread run
> safe.
>
> Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
> Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
> ---
> v1 -> v2
>  - Since perf top is always multi-threaded, remove top->nr_threads_synthesize judgment.

Not always, the synthesis can run in a single thread.

>
>  tools/perf/builtin-top.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index d4b5b02bab73..a18db1ee87fa 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1242,8 +1242,7 @@ static int __cmd_top(struct perf_top *top)
>         if (perf_session__register_idle_thread(top->session) < 0)
>                 return ret;
>
> -       if (top->nr_threads_synthesize > 1)
> -               perf_set_multithreaded();
> +       perf_set_multithreaded();

I think this part should be kept as is.

>
>         init_process_thread(top);
>
> @@ -1273,9 +1272,6 @@ static int __cmd_top(struct perf_top *top)
>                                     top->evlist->core.threads, true, false,
>                                     top->nr_threads_synthesize);
>
> -       if (top->nr_threads_synthesize > 1)
> -               perf_set_singlethreaded();

Here you can make it multi-threaded unconditionally.

Thanks,
Namhyung

> -
>         if (perf_hpp_list.socket) {
>                 ret = perf_env__read_cpu_topology_map(&perf_env);
>                 if (ret < 0) {
> @@ -1352,6 +1348,9 @@ static int __cmd_top(struct perf_top *top)
>  out_join_thread:
>         cond_signal(&top->qe.cond);
>         pthread_join(thread_process, NULL);
> +
> +       perf_set_singlethreaded();
> +
>         return ret;
>  }
>
> --
> 2.33.0
>
Re: [PATCH v2] perf top: expand the range of multithreaded phase
Posted by Hangliang Lai 2 years, 10 months ago
Thanks for your reply Kim ,

On 2023-04-07 21:21   you  wrote:

>  Not always, the synthesis can run in a single thread.

But I think in machine__synthesize_threads, there are thread_nr threads will be created to do synthesize_threads_worker(tools/perf/util/synthetic-events.c:970)

It\ufffd\ufffd\ufffds not a single thread part. So we're supposed to call perf_set_multithreaded() before synthesize?

Thanks,
Hangliang Lai
Re: [PATCH v2] perf top: expand the range of multithreaded phase
Posted by Wenyu Liu(D) 2 years, 10 months ago
Hello,I think Namhyung means only make it multi-threaded unconditionally after the synthesize

a patch like this:

---
 tools/perf/builtin-top.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d4b5b02bab73..60d00975b881 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1273,8 +1273,7 @@ static int __cmd_top(struct perf_top *top)
                                    top->evlist->core.threads, true, false,
                                    top->nr_threads_synthesize);

-       if (top->nr_threads_synthesize > 1)
-               perf_set_singlethreaded();
+       perf_set_multithreaded();

        if (perf_hpp_list.socket) {
                ret = perf_env__read_cpu_topology_map(&perf_env);
--

Right?

Thanks,
Wenyu

在 2023/4/10 10:58, Hangliang Lai 写道:
> Thanks for your reply Kim ,
> 
> 
> 
> On 2023-04-07 21:21   you  wrote:
> 
> 
> 
>>  Not always, the synthesis can run in a single thread.
> 
> 
> 
> But I think in machine__synthesize_threads, there are thread_nr threads will be created to do synthesize_threads_worker(tools/perf/util/synthetic-events.c:970)
> 
> 
> 
> It’s not a single thread part. So we're supposed to call perf_set_multithreaded() before synthesize?
> 
> 
> 
> Thanks,
> 
> Hangliang Lai
> 
> 
> 
[PATCH v3] perf top: expand the range of multithreaded phase
Posted by Hangliang Lai 2 years, 10 months ago
In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
donw_read and down_write can work to handle concurrency problems. Then top
use perf_set_singlethreaded and switch to single threaded phase, assuming
that no thread concurrency will happen later.
However, a UAF problem could occur in perf top in single threaded phase,
The concurrent procedure is like this:
display_thread                              process_thread
--------------                              --------------
thread__comm_len
  -> thread__comm_str
    -> __thread__comm_str(thread)
                                            thread__delete
                                             -> comm__free
                                              -> comm_str__put
                                               -> zfree(&cs->str)
    -> thread->comm_len = strlen(comm);
Since in single thread phase, perf_singlethreaded is true, down_read and
down_write can not work to avoid concurrency problems.
This patch put perf_set_singlethreaded to the function tail to expand the
multithreaded phase range, make display_thread and process_thread run
safe.

Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
---
v2 -> v3
 - Sorry for my misunderstanding, patch v3 makes perf_set_multithreaded
   unconditional after synthesis and set_singlethread in the end.

 tools/perf/builtin-top.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d4b5b02bab73..ae96ddaf85c4 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1273,8 +1273,7 @@ static int __cmd_top(struct perf_top *top)
 				    top->evlist->core.threads, true, false,
 				    top->nr_threads_synthesize);
 
-	if (top->nr_threads_synthesize > 1)
-		perf_set_singlethreaded();
+	perf_set_multithreaded();
 
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
@@ -1352,6 +1351,7 @@ static int __cmd_top(struct perf_top *top)
 out_join_thread:
 	cond_signal(&top->qe.cond);
 	pthread_join(thread_process, NULL);
+	perf_set_singlethreaded();
 	return ret;
 }
 
-- 
2.33.0
Re: [PATCH v3] perf top: expand the range of multithreaded phase
Posted by Namhyung Kim 2 years, 10 months ago
Hello,

On Mon, Apr 10, 2023 at 6:22 AM Hangliang Lai <laihangliang1@huawei.com> wrote:
>
> In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
> donw_read and down_write can work to handle concurrency problems. Then top
> use perf_set_singlethreaded and switch to single threaded phase, assuming
> that no thread concurrency will happen later.
> However, a UAF problem could occur in perf top in single threaded phase,
> The concurrent procedure is like this:
> display_thread                              process_thread
> --------------                              --------------
> thread__comm_len
>   -> thread__comm_str
>     -> __thread__comm_str(thread)
>                                             thread__delete
>                                              -> comm__free
>                                               -> comm_str__put
>                                                -> zfree(&cs->str)
>     -> thread->comm_len = strlen(comm);
> Since in single thread phase, perf_singlethreaded is true, down_read and
> down_write can not work to avoid concurrency problems.
> This patch put perf_set_singlethreaded to the function tail to expand the
> multithreaded phase range, make display_thread and process_thread run
> safe.
>
> Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
> Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
> v2 -> v3
>  - Sorry for my misunderstanding, patch v3 makes perf_set_multithreaded
>    unconditional after synthesis and set_singlethread in the end.
>
>  tools/perf/builtin-top.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index d4b5b02bab73..ae96ddaf85c4 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1273,8 +1273,7 @@ static int __cmd_top(struct perf_top *top)
>                                     top->evlist->core.threads, true, false,
>                                     top->nr_threads_synthesize);
>
> -       if (top->nr_threads_synthesize > 1)
> -               perf_set_singlethreaded();
> +       perf_set_multithreaded();
>
>         if (perf_hpp_list.socket) {
>                 ret = perf_env__read_cpu_topology_map(&perf_env);
> @@ -1352,6 +1351,7 @@ static int __cmd_top(struct perf_top *top)
>  out_join_thread:
>         cond_signal(&top->qe.cond);
>         pthread_join(thread_process, NULL);
> +       perf_set_singlethreaded();
>         return ret;
>  }
>
> --
> 2.33.0
>
[PATCH v4] perf top: expand the range of multithreaded phase
Posted by Hangliang Lai 2 years, 10 months ago
In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
donw_read and down_write can work to handle concurrency problems. Then top
use perf_set_singlethreaded and switch to single threaded phase, assuming
that no thread concurrency will happen later.
However, a UAF problem could occur in perf top in single threaded phase,
The concurrent procedure is like this:
display_thread                              process_thread
--------------                              --------------
thread__comm_len
  -> thread__comm_str
    -> __thread__comm_str(thread)
                                            thread__delete
                                             -> comm__free
                                              -> comm_str__put
                                               -> zfree(&cs->str)
    -> thread->comm_len = strlen(comm);
Since in single thread phase, perf_singlethreaded is true, down_read and
down_write can not work to avoid concurrency problems.
This patch put perf_set_singlethreaded to the function tail to expand the
multithreaded phase range, make display_thread and process_thread run
safe.

Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
Co-developed-by: Wenyu Liu <liuwenyu7@huawei.com>
Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
---
v3 -> v4
 - Add Acked-by and Co-developed-by. 

 tools/perf/builtin-top.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d4b5b02bab73..ae96ddaf85c4 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1273,8 +1273,7 @@ static int __cmd_top(struct perf_top *top)
 				    top->evlist->core.threads, true, false,
 				    top->nr_threads_synthesize);
 
-	if (top->nr_threads_synthesize > 1)
-		perf_set_singlethreaded();
+	perf_set_multithreaded();
 
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
@@ -1352,6 +1351,7 @@ static int __cmd_top(struct perf_top *top)
 out_join_thread:
 	cond_signal(&top->qe.cond);
 	pthread_join(thread_process, NULL);
+	perf_set_singlethreaded();
 	return ret;
 }
 
-- 
2.33.0
Re: [PATCH v4] perf top: expand the range of multithreaded phase
Posted by Arnaldo Carvalho de Melo 2 years, 10 months ago
Em Tue, Apr 11, 2023 at 09:32:24AM +0800, Hangliang Lai escreveu:
> In __cmd_top, perf_set_multithreaded is used to enable pthread_rwlock, thus
> donw_read and down_write can work to handle concurrency problems. Then top
> use perf_set_singlethreaded and switch to single threaded phase, assuming
> that no thread concurrency will happen later.
> However, a UAF problem could occur in perf top in single threaded phase,
> The concurrent procedure is like this:
> display_thread                              process_thread
> --------------                              --------------
> thread__comm_len
>   -> thread__comm_str
>     -> __thread__comm_str(thread)
>                                             thread__delete
>                                              -> comm__free
>                                               -> comm_str__put
>                                                -> zfree(&cs->str)
>     -> thread->comm_len = strlen(comm);
> Since in single thread phase, perf_singlethreaded is true, down_read and
> down_write can not work to avoid concurrency problems.
> This patch put perf_set_singlethreaded to the function tail to expand the
> multithreaded phase range, make display_thread and process_thread run
> safe.
> 
> Signed-off-by: Hangliang Lai  <laihangliang1@huawei.com>
> Co-developed-by: Wenyu Liu <liuwenyu7@huawei.com>
> Reviewed-by: Yunfeng Ye <yeyunfeng@huawei.com>
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied.

- Arnaldo


> ---
> v3 -> v4
>  - Add Acked-by and Co-developed-by. 
> 
>  tools/perf/builtin-top.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index d4b5b02bab73..ae96ddaf85c4 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1273,8 +1273,7 @@ static int __cmd_top(struct perf_top *top)
>  				    top->evlist->core.threads, true, false,
>  				    top->nr_threads_synthesize);
>  
> -	if (top->nr_threads_synthesize > 1)
> -		perf_set_singlethreaded();
> +	perf_set_multithreaded();
>  
>  	if (perf_hpp_list.socket) {
>  		ret = perf_env__read_cpu_topology_map(&perf_env);
> @@ -1352,6 +1351,7 @@ static int __cmd_top(struct perf_top *top)
>  out_join_thread:
>  	cond_signal(&top->qe.cond);
>  	pthread_join(thread_process, NULL);
> +	perf_set_singlethreaded();
>  	return ret;
>  }
>  
> -- 
> 2.33.0
> 

-- 

- Arnaldo