[v2] perf/lock: enable end-timestamp accounting for cgroup aggregation

[PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation

Posted by Suchit Karunakaran 1 week, 1 day ago

update_lock_stat() handles lock contentions that start but never reach a
contention_end event (e.g., locks still held when profiling stops), but
previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
context in userspace.

Fix this by adding a cgroup_id field to struct tstamp_data, recording it
at contention_begin using get_current_cgroup_id() when aggr_mode is
LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
correct, the contention cost is incurred by the task that had to wait,
not by whatever task happens to be running at contention_end. It is also
preferable from a performance standpoint, as contention_end runs just
before the task enters the critical section.

Update contention_end to use pelem->cgroup_id instead of calling
get_current_cgroup_id() dynamically, ensuring both complete and
incomplete contention events attribute the wait time to the cgroup at
wait-start time consistently.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
---
 tools/perf/util/bpf_lock_contention.c          | 4 ++--
 tools/perf/util/bpf_skel/lock_contention.bpf.c | 4 +++-
 tools/perf/util/bpf_skel/lock_data.h           | 1 +
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index cbd7435579fe..1a5bd2ff8ee4 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
 		stat_key.lock_addr_or_cgroup = ts_data->lock;
 		break;
 	case LOCK_AGGR_CGROUP:
-		/* TODO */
-		return;
+		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
+		break;
 	default:
 		return;
 	}
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 96e7d853b9ed..139199811020 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -536,6 +536,8 @@ int contention_begin(u64 *ctx)
 	pelem->timestamp = bpf_ktime_get_ns();
 	pelem->lock = (__u64)ctx[0];
 	pelem->flags = (__u32)ctx[1];
+	if (aggr_mode == LOCK_AGGR_CGROUP)
+		pelem->cgroup_id = get_current_cgroup_id();
 
 	if (needs_callstack) {
 		u32 i = 0;
@@ -771,7 +773,7 @@ int contention_end(u64 *ctx)
 			key.stack_id = pelem->stack_id;
 		break;
 	case LOCK_AGGR_CGROUP:
-		key.lock_addr_or_cgroup = get_current_cgroup_id();
+		key.lock_addr_or_cgroup = pelem->cgroup_id;
 		break;
 	default:
 		/* should not happen */
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index 28c5e5aced7f..652e114e6b87 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -13,6 +13,7 @@ struct owner_tracing_data {
 struct tstamp_data {
 	u64 timestamp;
 	u64 lock;
+	u64 cgroup_id;
 	u32 flags;
 	s32 stack_id;
 };
-- 
2.54.0

Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation

Posted by Arnaldo Carvalho de Melo 3 days, 17 hours ago

On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> update_lock_stat() handles lock contentions that start but never reach a
> contention_end event (e.g., locks still held when profiling stops), but
> previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> context in userspace.
> 
> Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> at contention_begin using get_current_cgroup_id() when aggr_mode is
> LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> correct, the contention cost is incurred by the task that had to wait,
> not by whatever task happens to be running at contention_end. It is also
> preferable from a performance standpoint, as contention_end runs just
> before the task enters the critical section.
> 
> Update contention_end to use pelem->cgroup_id instead of calling
> get_current_cgroup_id() dynamically, ensuring both complete and
> incomplete contention events attribute the wait time to the cgroup at
> wait-start time consistently.

Namhyung, can you provide an Acked-by or Reviewed-by?

Thanks,

- Arnaldo
 
> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
> ---
>  tools/perf/util/bpf_lock_contention.c          | 4 ++--
>  tools/perf/util/bpf_skel/lock_contention.bpf.c | 4 +++-
>  tools/perf/util/bpf_skel/lock_data.h           | 1 +
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index cbd7435579fe..1a5bd2ff8ee4 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
>  		stat_key.lock_addr_or_cgroup = ts_data->lock;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		/* TODO */
> -		return;
> +		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
> +		break;
>  	default:
>  		return;
>  	}
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 96e7d853b9ed..139199811020 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -536,6 +536,8 @@ int contention_begin(u64 *ctx)
>  	pelem->timestamp = bpf_ktime_get_ns();
>  	pelem->lock = (__u64)ctx[0];
>  	pelem->flags = (__u32)ctx[1];
> +	if (aggr_mode == LOCK_AGGR_CGROUP)
> +		pelem->cgroup_id = get_current_cgroup_id();
>  
>  	if (needs_callstack) {
>  		u32 i = 0;
> @@ -771,7 +773,7 @@ int contention_end(u64 *ctx)
>  			key.stack_id = pelem->stack_id;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		key.lock_addr_or_cgroup = get_current_cgroup_id();
> +		key.lock_addr_or_cgroup = pelem->cgroup_id;
>  		break;
>  	default:
>  		/* should not happen */
> diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
> index 28c5e5aced7f..652e114e6b87 100644
> --- a/tools/perf/util/bpf_skel/lock_data.h
> +++ b/tools/perf/util/bpf_skel/lock_data.h
> @@ -13,6 +13,7 @@ struct owner_tracing_data {
>  struct tstamp_data {
>  	u64 timestamp;
>  	u64 lock;
> +	u64 cgroup_id;
>  	u32 flags;
>  	s32 stack_id;
>  };
> -- 
> 2.54.0

Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation

Posted by Namhyung Kim 3 days, 15 hours ago

On Thu, Jun 04, 2026 at 10:37:05AM -0300, Arnaldo Carvalho de Melo wrote:
> On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> > update_lock_stat() handles lock contentions that start but never reach a
> > contention_end event (e.g., locks still held when profiling stops), but
> > previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> > context in userspace.
> > 
> > Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> > at contention_begin using get_current_cgroup_id() when aggr_mode is
> > LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> > correct, the contention cost is incurred by the task that had to wait,
> > not by whatever task happens to be running at contention_end. It is also
> > preferable from a performance standpoint, as contention_end runs just
> > before the task enters the critical section.
> > 
> > Update contention_end to use pelem->cgroup_id instead of calling
> > get_current_cgroup_id() dynamically, ensuring both complete and
> > incomplete contention events attribute the wait time to the cgroup at
> > wait-start time consistently.
> 
> Namhyung, can you provide an Acked-by or Reviewed-by?

Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation

Posted by Arnaldo Carvalho de Melo 3 days, 11 hours ago

On Thu, Jun 04, 2026 at 09:31:15AM -0700, Namhyung Kim wrote:
> On Thu, Jun 04, 2026 at 10:37:05AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> > > update_lock_stat() handles lock contentions that start but never reach a
> > > contention_end event (e.g., locks still held when profiling stops), but
> > > previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> > > context in userspace.
> > > 
> > > Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> > > at contention_begin using get_current_cgroup_id() when aggr_mode is
> > > LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> > > correct, the contention cost is incurred by the task that had to wait,
> > > not by whatever task happens to be running at contention_end. It is also
> > > preferable from a performance standpoint, as contention_end runs just
> > > before the task enters the critical section.
> > > 
> > > Update contention_end to use pelem->cgroup_id instead of calling
> > > get_current_cgroup_id() dynamically, ensuring both complete and
> > > incomplete contention events attribute the wait time to the cgroup at
> > > wait-start time consistently.
> > 
> > Namhyung, can you provide an Acked-by or Reviewed-by?
> 
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied to perf-tools-next, for v7.2.

- Arnaldo