From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E3328C43334
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:31:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S243676AbiFILbE (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:31:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53600 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243635AbiFILbC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:02 -0400
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3893A39C107
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:01 -0700 (PDT)
Received: by mail-yb1-xb49.google.com with SMTP id
 q200-20020a252ad1000000b006632baa38deso13550585ybq.15
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=vZHbZ1oi5OSjCMeU4/rneix3L5XarWjinLy785pgor0=;
        b=rJ6pp5xowS86Nd+/gorliRhFqUB5J+6mavLLuFiIdbWcQEhh6CT0t2PyhmpZ7XlHuV
         fJEKAnXZefd7pn33hZxyEaRytMDaJ8OfYiSvmXWHsOaxoj8OsWOqbyD3u8Hqg4tEuyux
         zsPtCAqnIgLyFyTYD2Fcn6nm+BwXQfHXoAuBsmLe34GoBAI4YIIqe45Sqp8NV28d0S+D
         bltiNySjqDhkJoauzSpEgnSsKrVSYQZ0l486YY3xoXoI+wtdC1H3VBf2GCJqNQZFzdYt
         v1IYGNzqEUhFbjtOBXkScAJ9ok5FhxPFKqh0SSpwI1MU6XFAngUa/85tiBwSpXZ6u1UF
         7JFg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=vZHbZ1oi5OSjCMeU4/rneix3L5XarWjinLy785pgor0=;
        b=iDjCy4XoJqp4qbJ+G2UNo+In2YO1UBvKT/rvQaGSUfDTJyd2BQxTgcL2m3ZwidB0ph
         h/TWVBBa21C+BMjs6OCyIMeXL5ll8bp50nPkN4Gvxw7ZDDXZEaGpJWGNqemTO+/I4t9Y
         5dzjJDJfNDdAhIh9OjHnVKDehHFjj+7upgn3e+l/2HcSy9ZXJi7iCNUH0OGN3uAu8n7I
         QLu/SsQstvQ+DAXiHuJy0fBe38ehVtCrnd6BfpDEIR0BCVrOVMJm9bxe3hNuT1F5U4Gq
         VgbQYin6PTkxepd49L/SVBkl8nm6F/VbrFTnKABp0gHpQMLl2nClD722nj00xNZOz0uN
         X+3g==
X-Gm-Message-State: AOAM530aCX3tdmQTB2RpYhON0/Tt1jtQopO75G8lja8Qah5siIYxZuf9
        UpWiV7aDXbIeG+43/gU+c5CFCkEZMw==
X-Google-Smtp-Source: 
 ABdhPJxnsh9anBT52y2AD1F6IjCtDlLW/bu8fyR/DC08IH7PUuKLO9593httKBZZ5/ksOups8ZfqirI4yw==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:a25:dccd:0:b0:65c:bc72:75bf with SMTP id
 y196-20020a25dccd000000b0065cbc7275bfmr37441073ybe.315.1654774260428; Thu, 09
 Jun 2022 04:31:00 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:39 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-2-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 1/8] perf/hw_breakpoint: Optimize list of per-task breakpoints
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

On a machine with 256 CPUs, running the recently added perf breakpoint
benchmark results in:

 | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 4 breakpoints and 64 parallelism
 |      Total time: 236.418 [sec]
 |
 |   123134.794271 usecs/op
 |  7880626.833333 usecs/op/cpu

The benchmark tests inherited breakpoint perf events across many
threads.

Looking at a perf profile, we can see that the majority of the time is
spent in various hw_breakpoint.c functions, which execute within the
'nr_bp_mutex' critical sections which then results in contention on that
mutex as well:

    37.27%  [kernel]       [k] osq_lock
    34.92%  [kernel]       [k] mutex_spin_on_owner
    12.15%  [kernel]       [k] toggle_bp_slot
    11.90%  [kernel]       [k] __reserve_bp_slot

The culprit here is task_bp_pinned(), which has a runtime complexity of
O(#tasks) due to storing all task breakpoints in the same list and
iterating through that list looking for a matching task. Clearly, this
does not scale to thousands of tasks.

While one option would be to make task_struct a breakpoint list node,
this would only further bloat task_struct for infrequently used data.

Instead, make use of the "rhashtable" variant "rhltable" which stores
multiple items with the same key in a list. This results in average
runtime complexity of O(1) for task_bp_pinned().

With the optimization, the benchmark shows:

 | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 4 breakpoints and 64 parallelism
 |      Total time: 0.208 [sec]
 |
 |      108.422396 usecs/op
 |     6939.033333 usecs/op/cpu

On this particular setup that's a speedup of ~1135x.

Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
---
 include/linux/perf_event.h    |  3 +-
 kernel/events/hw_breakpoint.c | 56 ++++++++++++++++++++++-------------
 2 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 01231f1d976c..e27360436dc6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -36,6 +36,7 @@ struct perf_guest_info_callbacks {
 };
=20
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
+#include <linux/rhashtable-types.h>
 #include <asm/hw_breakpoint.h>
 #endif
=20
@@ -178,7 +179,7 @@ struct hw_perf_event {
 			 * creation and event initalization.
 			 */
 			struct arch_hw_breakpoint	info;
-			struct list_head		bp_list;
+			struct rhlist_head		bp_list;
 		};
 #endif
 		struct { /* amd_iommu */
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index f32320ac02fd..25c94c6e918d 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -28,7 +28,7 @@
 #include <linux/sched.h>
 #include <linux/init.h>
 #include <linux/slab.h>
-#include <linux/list.h>
+#include <linux/rhashtable.h>
 #include <linux/cpu.h>
 #include <linux/smp.h>
 #include <linux/bug.h>
@@ -55,7 +55,13 @@ static struct bp_cpuinfo *get_bp_info(int cpu, enum bp_t=
ype_idx type)
 }
=20
 /* Keep track of the breakpoints attached to tasks */
-static LIST_HEAD(bp_task_head);
+static struct rhltable task_bps_ht;
+static const struct rhashtable_params task_bps_ht_params =3D {
+	.head_offset =3D offsetof(struct hw_perf_event, bp_list),
+	.key_offset =3D offsetof(struct hw_perf_event, target),
+	.key_len =3D sizeof_field(struct hw_perf_event, target),
+	.automatic_shrinking =3D true,
+};
=20
 static int constraints_initialized;
=20
@@ -104,17 +110,23 @@ static unsigned int max_task_bp_pinned(int cpu, enum =
bp_type_idx type)
  */
 static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx=
 type)
 {
-	struct task_struct *tsk =3D bp->hw.target;
+	struct rhlist_head *head, *pos;
 	struct perf_event *iter;
 	int count =3D 0;
=20
-	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
-		if (iter->hw.target =3D=3D tsk &&
-		    find_slot_idx(iter->attr.bp_type) =3D=3D type &&
+	rcu_read_lock();
+	head =3D rhltable_lookup(&task_bps_ht, &bp->hw.target, task_bps_ht_params=
);
+	if (!head)
+		goto out;
+
+	rhl_for_each_entry_rcu(iter, pos, head, hw.bp_list) {
+		if (find_slot_idx(iter->attr.bp_type) =3D=3D type &&
 		    (iter->cpu < 0 || cpu =3D=3D iter->cpu))
 			count +=3D hw_breakpoint_weight(iter);
 	}
=20
+out:
+	rcu_read_unlock();
 	return count;
 }
=20
@@ -187,7 +199,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, =
int cpu,
 /*
  * Add/remove the given breakpoint in our constraint table
  */
-static void
+static int
 toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
 	       int weight)
 {
@@ -200,7 +212,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum=
 bp_type_idx type,
 	/* Pinned counter cpu profiling */
 	if (!bp->hw.target) {
 		get_bp_info(bp->cpu, type)->cpu_pinned +=3D weight;
-		return;
+		return 0;
 	}
=20
 	/* Pinned counter task profiling */
@@ -208,9 +220,9 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum=
 bp_type_idx type,
 		toggle_bp_task_slot(bp, cpu, type, weight);
=20
 	if (enable)
-		list_add_tail(&bp->hw.bp_list, &bp_task_head);
+		return rhltable_insert(&task_bps_ht, &bp->hw.bp_list, task_bps_ht_params=
);
 	else
-		list_del(&bp->hw.bp_list);
+		return rhltable_remove(&task_bps_ht, &bp->hw.bp_list, task_bps_ht_params=
);
 }
=20
 __weak int arch_reserve_bp_slot(struct perf_event *bp)
@@ -308,9 +320,7 @@ static int __reserve_bp_slot(struct perf_event *bp, u64=
 bp_type)
 	if (ret)
 		return ret;
=20
-	toggle_bp_slot(bp, true, type, weight);
-
-	return 0;
+	return toggle_bp_slot(bp, true, type, weight);
 }
=20
 int reserve_bp_slot(struct perf_event *bp)
@@ -335,7 +345,7 @@ static void __release_bp_slot(struct perf_event *bp, u6=
4 bp_type)
=20
 	type =3D find_slot_idx(bp_type);
 	weight =3D hw_breakpoint_weight(bp);
-	toggle_bp_slot(bp, false, type, weight);
+	WARN_ON(toggle_bp_slot(bp, false, type, weight));
 }
=20
 void release_bp_slot(struct perf_event *bp)
@@ -679,7 +689,7 @@ static struct pmu perf_breakpoint =3D {
 int __init init_hw_breakpoint(void)
 {
 	int cpu, err_cpu;
-	int i;
+	int i, ret;
=20
 	for (i =3D 0; i < TYPE_MAX; i++)
 		nr_slots[i] =3D hw_breakpoint_slots(i);
@@ -690,18 +700,24 @@ int __init init_hw_breakpoint(void)
=20
 			info->tsk_pinned =3D kcalloc(nr_slots[i], sizeof(int),
 							GFP_KERNEL);
-			if (!info->tsk_pinned)
-				goto err_alloc;
+			if (!info->tsk_pinned) {
+				ret =3D -ENOMEM;
+				goto err;
+			}
 		}
 	}
=20
+	ret =3D rhltable_init(&task_bps_ht, &task_bps_ht_params);
+	if (ret)
+		goto err;
+
 	constraints_initialized =3D 1;
=20
 	perf_pmu_register(&perf_breakpoint, "breakpoint", PERF_TYPE_BREAKPOINT);
=20
 	return register_die_notifier(&hw_breakpoint_exceptions_nb);
=20
- err_alloc:
+err:
 	for_each_possible_cpu(err_cpu) {
 		for (i =3D 0; i < TYPE_MAX; i++)
 			kfree(get_bp_info(err_cpu, i)->tsk_pinned);
@@ -709,7 +725,5 @@ int __init init_hw_breakpoint(void)
 			break;
 	}
=20
-	return -ENOMEM;
+	return ret;
 }
-
-
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9AF00C43334
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:31:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S243748AbiFILbK (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:31:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243706AbiFILbF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:05 -0400
Received: from mail-ed1-x549.google.com (mail-ed1-x549.google.com
 [IPv6:2a00:1450:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9492D39E4A1
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:04 -0700 (PDT)
Received: by mail-ed1-x549.google.com with SMTP id
 m6-20020aa7c2c6000000b0042dc237d9e7so16944309edp.15
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=wnAKNm59aHUnnDDyhy7DQsgawVHUm92RhZIM8y499GE=;
        b=WkMVlpXnnh1ry0CE/UDyLuTslVl/KojJzPPXdlolY5n5SJY+KHvM0s4QPsDw/ImjRh
         H7xWhFfwc7tkUsc7mIagXa6nY0nBENo5RRHA3oEV6WH+2w2X3aHNZ85++S+erMQfqPkx
         vd4C/Xgk6CSQCYAYghdKE47r5IhfLhU8b566MOuhEDdT63B/h20KCT5kzXAs8AkFhe54
         tBIQJD1p2jmbzcuTVidce8EB/oRJaEyTbhMHSxR4ShMTEDvDheLB49ozQ9DEvxGI/ipf
         EYERxwz18K4afh4hwu2YtxEc0JbjpJC0jcF2XZbqOM1AiaYhRREJCDp2LBgG31/PJS3n
         AuLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=wnAKNm59aHUnnDDyhy7DQsgawVHUm92RhZIM8y499GE=;
        b=cWhTR2Ohd+HFk5rxoYuqruFDw5DyR1D737y2FpwX4+d8TwLRHime/rUCjPbsoj7lRa
         ZzypVdm5OWYF043n6Cg0xn2xtL74b9sInMd6HEpKO+bkrJQxFnqolm6B1El/9dOIKoBG
         lpluYB0mhafdSmQ1FvDDbFosL5oEx2PPG5OBH6pRHCz9cjZndUIvRkumzQVrx0Nu2IoI
         O8SSQPWovZcdJKL5Cxg+fENW+ONSRxaTOTEBjdT9wYx1rJSmESSFod3ygbqRAJ/3GNF1
         qjTuyv2k+t6i7dwpJEiSoFGS3JQNITgpwYqB8DKjT4g+x/2Ttuh9JvN+rPdcncW7y9aL
         nNJw==
X-Gm-Message-State: AOAM531c2ggBuW7OWmlojVjIyjB8JS1Fd85jCrjn/wfXWB3rfZIuyro7
        sy/TvOTi1EI9LM4xYJG69r3p/5zH+g==
X-Google-Smtp-Source: 
 ABdhPJx73qjTeXCe10FaRzf/2yFdYj63Q0oseWtLrCparCyLdJilS10nXnGYV7x816rUyXswmnTt6sQNBQ==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:a05:6402:500b:b0:431:78d0:bf9d with SMTP id
 p11-20020a056402500b00b0043178d0bf9dmr17528643eda.184.1654774262890; Thu, 09
 Jun 2022 04:31:02 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:40 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-3-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 2/8] perf/hw_breakpoint: Mark data __ro_after_init
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Mark read-only data after initialization as __ro_after_init.

While we are here, turn 'constraints_initialized' into a bool.

Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
---
 kernel/events/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 25c94c6e918d..1f718745d569 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -47,7 +47,7 @@ struct bp_cpuinfo {
 };
=20
 static DEFINE_PER_CPU(struct bp_cpuinfo, bp_cpuinfo[TYPE_MAX]);
-static int nr_slots[TYPE_MAX];
+static int nr_slots[TYPE_MAX] __ro_after_init;
=20
 static struct bp_cpuinfo *get_bp_info(int cpu, enum bp_type_idx type)
 {
@@ -63,7 +63,7 @@ static const struct rhashtable_params task_bps_ht_params =
=3D {
 	.automatic_shrinking =3D true,
 };
=20
-static int constraints_initialized;
+static bool constraints_initialized __ro_after_init;
=20
 /* Gather the number of total pinned and un-pinned bp in a cpuset */
 struct bp_busy_slots {
@@ -711,7 +711,7 @@ int __init init_hw_breakpoint(void)
 	if (ret)
 		goto err;
=20
-	constraints_initialized =3D 1;
+	constraints_initialized =3D true;
=20
 	perf_pmu_register(&perf_breakpoint, "breakpoint", PERF_TYPE_BREAKPOINT);
=20
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1F228C43334
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:31:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S243793AbiFILbR (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:31:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54342 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243689AbiFILbL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:11 -0400
Received: from mail-ej1-x649.google.com (mail-ej1-x649.google.com
 [IPv6:2a00:1450:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C4653A0E67
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:07 -0700 (PDT)
Received: by mail-ej1-x649.google.com with SMTP id
 k7-20020a1709062a4700b006fe92440164so10873684eje.23
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=/f8j1JHjR27jI6H3SEM6tLzJpymbckkYBJ4SkEYwIq8=;
        b=LyoQ4FOMRhwn7n2IlecHdLZrsM1pYFG44BCJ7EDP/LFEezIyF4PbZZ6L1gdRa001qC
         gw2hQCy+e0sg4D+cUQ3GzF6eB0VOUCv9WMksfhBhyYQnD3a9CkO2lm6kEzfgPZr9rTlF
         c80CDzbsJULj/dfKYnJP53KLqTcoebAW+fICFgoGajhBYeTTOW2NSlhKF/FnPunjWzOt
         F5a36paLV6Sish3iefODRzGpipMD/NulwT5MQccf/AY/tMHD02muEFSgTzdxDmztdpxx
         /B3wgaH4brTDN/XrIX6759k0C4mRkf50lmQEESwwoSOdyVjBTWd9xYOwruYzwVoSqMuJ
         d5pw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=/f8j1JHjR27jI6H3SEM6tLzJpymbckkYBJ4SkEYwIq8=;
        b=uwvthHM8sZ8o2bEjSHvdW0iGrr2Z2t6DcbPI4mFdSMjirij1neMU/9VMkM6pY/gZXi
         QpA1DnldL4k9l1yJKT84JwsgHS2Hwqit5rSpdnU7WIU7v+noqKB319joMJ+c0P75Jw6A
         FFCDAT02ekMLUI/WSo7vWQFfsO9AWBQPS/dC6eFXnwEjphNjvVsHokltNm2l7jxg2XPL
         tmQIMyKAFpnMGwhPbcQ5lOiGIRKAg/URw9PaGObnLoMz4iY6FutBa70/FPV01qmfWTaw
         2jDH4Rn2ICBnvmHtM8mIdHZYj2OUI1sbJfq2fkeG6/TGaP+y5UYn4nGgXpUlPS8OhRHi
         AAjA==
X-Gm-Message-State: AOAM532DVxllWl/azk1GwhaypDdVfN+iVn1ydc2dwdS5YMXb81KbBTYK
        ynLZMXT8hhkQvqkNLEgxDLvOAee8qA==
X-Google-Smtp-Source: 
 ABdhPJwCot/NHaewmZ4pL2R0MBlmKwm761sxNJBh6QT/v8O/I0zga25h7cDWb6MxLU6b5JPBogvXe6sm+A==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:a17:906:3bd9:b0:6ff:4b5:4a8f with SMTP id
 v25-20020a1709063bd900b006ff04b54a8fmr29080565ejf.139.1654774265622; Thu, 09
 Jun 2022 04:31:05 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:41 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-4-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 3/8] perf/hw_breakpoint: Optimize constant number of
 breakpoint slots
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Optimize internal hw_breakpoint state if the architecture's number of
breakpoint slots is constant. This avoids several kmalloc() calls and
potentially unnecessary failures if the allocations fail, as well as
subtly improves code generation and cache locality.

The protocol is that if an architecture defines hw_breakpoint_slots via
the preprocessor, it must be constant and the same for all types.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
---
 arch/sh/include/asm/hw_breakpoint.h  |  5 +-
 arch/x86/include/asm/hw_breakpoint.h |  5 +-
 kernel/events/hw_breakpoint.c        | 92 ++++++++++++++++++----------
 3 files changed, 62 insertions(+), 40 deletions(-)

diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_b=
reakpoint.h
index 199d17b765f2..361a0f57bdeb 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -48,10 +48,7 @@ struct pmu;
 /* Maximum number of UBC channels */
 #define HBP_NUM		2
=20
-static inline int hw_breakpoint_slots(int type)
-{
-	return HBP_NUM;
-}
+#define hw_breakpoint_slots(type) (HBP_NUM)
=20
 /* arch/sh/kernel/hw_breakpoint.c */
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw=
_breakpoint.h
index a1f0e90d0818..0bc931cd0698 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -44,10 +44,7 @@ struct arch_hw_breakpoint {
 /* Total number of available HW breakpoint registers */
 #define HBP_NUM 4
=20
-static inline int hw_breakpoint_slots(int type)
-{
-	return HBP_NUM;
-}
+#define hw_breakpoint_slots(type) (HBP_NUM)
=20
 struct perf_event_attr;
 struct perf_event;
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 1f718745d569..8e939723f27d 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -41,13 +41,16 @@ struct bp_cpuinfo {
 	/* Number of pinned cpu breakpoints in a cpu */
 	unsigned int	cpu_pinned;
 	/* tsk_pinned[n] is the number of tasks having n+1 breakpoints */
+#ifdef hw_breakpoint_slots
+	unsigned int	tsk_pinned[hw_breakpoint_slots(0)];
+#else
 	unsigned int	*tsk_pinned;
+#endif
 	/* Number of non-pinned cpu/task breakpoints in a cpu */
 	unsigned int	flexible; /* XXX: placeholder, see fetch_this_slot() */
 };
=20
 static DEFINE_PER_CPU(struct bp_cpuinfo, bp_cpuinfo[TYPE_MAX]);
-static int nr_slots[TYPE_MAX] __ro_after_init;
=20
 static struct bp_cpuinfo *get_bp_info(int cpu, enum bp_type_idx type)
 {
@@ -74,6 +77,54 @@ struct bp_busy_slots {
 /* Serialize accesses to the above constraints */
 static DEFINE_MUTEX(nr_bp_mutex);
=20
+#ifdef hw_breakpoint_slots
+/*
+ * Number of breakpoint slots is constant, and the same for all types.
+ */
+static_assert(hw_breakpoint_slots(TYPE_INST) =3D=3D hw_breakpoint_slots(TY=
PE_DATA));
+static inline int hw_breakpoint_slots_cached(int type)	{ return hw_breakpo=
int_slots(type); }
+static inline int init_breakpoint_slots(void)		{ return 0; }
+#else
+/*
+ * Dynamic number of breakpoint slots.
+ */
+static int __nr_bp_slots[TYPE_MAX] __ro_after_init;
+
+static inline int hw_breakpoint_slots_cached(int type)
+{
+	return __nr_bp_slots[type];
+}
+
+static __init int init_breakpoint_slots(void)
+{
+	int i, cpu, err_cpu;
+
+	for (i =3D 0; i < TYPE_MAX; i++)
+		__nr_bp_slots[i] =3D hw_breakpoint_slots(i);
+
+	for_each_possible_cpu(cpu) {
+		for (i =3D 0; i < TYPE_MAX; i++) {
+			struct bp_cpuinfo *info =3D get_bp_info(cpu, i);
+
+			info->tsk_pinned =3D kcalloc(__nr_bp_slots[i], sizeof(int), GFP_KERNEL);
+			if (!info->tsk_pinned)
+				goto err;
+		}
+	}
+
+	return 0;
+err:
+	for_each_possible_cpu(err_cpu) {
+		for (i =3D 0; i < TYPE_MAX; i++)
+			kfree(get_bp_info(err_cpu, i)->tsk_pinned);
+		if (err_cpu =3D=3D cpu)
+			break;
+	}
+
+	return -ENOMEM;
+}
+#endif
+
 __weak int hw_breakpoint_weight(struct perf_event *bp)
 {
 	return 1;
@@ -96,7 +147,7 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_=
type_idx type)
 	unsigned int *tsk_pinned =3D get_bp_info(cpu, type)->tsk_pinned;
 	int i;
=20
-	for (i =3D nr_slots[type] - 1; i >=3D 0; i--) {
+	for (i =3D hw_breakpoint_slots_cached(type) - 1; i >=3D 0; i--) {
 		if (tsk_pinned[i] > 0)
 			return i + 1;
 	}
@@ -313,7 +364,7 @@ static int __reserve_bp_slot(struct perf_event *bp, u64=
 bp_type)
 	fetch_this_slot(&slots, weight);
=20
 	/* Flexible counters need to keep at least one slot */
-	if (slots.pinned + (!!slots.flexible) > nr_slots[type])
+	if (slots.pinned + (!!slots.flexible) > hw_breakpoint_slots_cached(type))
 		return -ENOSPC;
=20
 	ret =3D arch_reserve_bp_slot(bp);
@@ -688,42 +739,19 @@ static struct pmu perf_breakpoint =3D {
=20
 int __init init_hw_breakpoint(void)
 {
-	int cpu, err_cpu;
-	int i, ret;
-
-	for (i =3D 0; i < TYPE_MAX; i++)
-		nr_slots[i] =3D hw_breakpoint_slots(i);
-
-	for_each_possible_cpu(cpu) {
-		for (i =3D 0; i < TYPE_MAX; i++) {
-			struct bp_cpuinfo *info =3D get_bp_info(cpu, i);
-
-			info->tsk_pinned =3D kcalloc(nr_slots[i], sizeof(int),
-							GFP_KERNEL);
-			if (!info->tsk_pinned) {
-				ret =3D -ENOMEM;
-				goto err;
-			}
-		}
-	}
+	int ret;
=20
 	ret =3D rhltable_init(&task_bps_ht, &task_bps_ht_params);
 	if (ret)
-		goto err;
+		return ret;
+
+	ret =3D init_breakpoint_slots();
+	if (ret)
+		return ret;
=20
 	constraints_initialized =3D true;
=20
 	perf_pmu_register(&perf_breakpoint, "breakpoint", PERF_TYPE_BREAKPOINT);
=20
 	return register_die_notifier(&hw_breakpoint_exceptions_nb);
-
-err:
-	for_each_possible_cpu(err_cpu) {
-		for (i =3D 0; i < TYPE_MAX; i++)
-			kfree(get_bp_info(err_cpu, i)->tsk_pinned);
-		if (err_cpu =3D=3D cpu)
-			break;
-	}
-
-	return ret;
 }
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5781C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:31:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240123AbiFILbY (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:31:24 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54356 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243772AbiFILbN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:13 -0400
Received: from mail-ed1-x549.google.com (mail-ed1-x549.google.com
 [IPv6:2a00:1450:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BBA23A393E
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:10 -0700 (PDT)
Received: by mail-ed1-x549.google.com with SMTP id
 g7-20020a056402424700b0042dee9d11d0so16792898edb.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=95dYiJC9IDZol3AtuiRZW/23v+t4OmpxxOGgsAq7Fsc=;
        b=bk/INExYMBUiLgU9B9FG9Wxis71pb98dHe4VkAA3lfm2iODhPZt7k+FVxcXk/IlJQ5
         RpFX2tLyK5ayKvJnfKSmbovwQdAQ9qChX7IVHYGLvsWE83meaJOU58o4XvSilpxs1auS
         1rHmsuprC9Lwk94yHLzDR08aqrODjpQg8nX/chmzEEDz9yCpbteM4nwzxs+w96n83IeJ
         shOx0BcOzTq3E4JF/hlMco7FAyjn60LdG7vKytD2J+ZdYsrW8rFTDNPh95b3vtUVvhsA
         nTRmG/zdlPgdhI6hDiah81kGmc0I75VpdGiTUpIkvfkfJ+tkQQEQ29pWBbEWgxA305qQ
         +ZmQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=95dYiJC9IDZol3AtuiRZW/23v+t4OmpxxOGgsAq7Fsc=;
        b=Gjyt2NISWqO3iQmBWrIFoR2G1nNzIHACHnIVXTKeIOdIKX9o6oyNSYWRZdQTa/u3d6
         9fsRsgPfL4BcXCdfNWJl/GKJR2ReOXH/Z7JhZQu1FAsO5UvIWNqgPmD7vB8CTQlq1lzd
         TBY9rtaJXlIuhV/g9s1EMrli+KuDBYI7jtOQ0u01UuTqyhxR6le5lUwVLmx4s6jJG6yu
         wyxPQIHtasCCzkhXabnvtU+zTYgoGJgetsBlhEPMukv3ylv58gbRoSmfl19HnoXRpbrg
         iFUkiYTiD4iPi9N3Tz+DOqdRgJ1Pwql3s8rjzGy/8nya0ckHUpKa36WKsDIOKtYzAPUE
         XFKQ==
X-Gm-Message-State: AOAM533DRVoQ2yG5NxsuYXQ+XcqG3gJjpovlcI3AYFdf4ylOAjOyp4qt
        x9onNpsK/79zFWfw84DJF+j6yaWcyA==
X-Google-Smtp-Source: 
 ABdhPJwVUKGw0QLiFW9XPwf2r2NLAq5BjdrToZq6AokVHC66Y8e3UeznCBqRNOYwVV+V22o7YNnBpxHjfQ==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:aa7:c706:0:b0:42d:c4ad:ce0a with SMTP id
 i6-20020aa7c706000000b0042dc4adce0amr45226048edq.272.1654774268320; Thu, 09
 Jun 2022 04:31:08 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:42 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-5-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 4/8] perf/hw_breakpoint: Make hw_breakpoint_weight() inlinable
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Due to being a __weak function, hw_breakpoint_weight() will cause the
compiler to always emit a call to it. This generates unnecessarily bad
code (register spills etc.) for no good reason; in fact it appears in
profiles of `perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512`:

    ...
    0.70%  [kernel]       [k] hw_breakpoint_weight
    ...

While a small percentage, no architecture defines its own
hw_breakpoint_weight() nor are there users outside hw_breakpoint.c,
which makes the fact it is currently __weak a poor choice.

Change hw_breakpoint_weight()'s definition to follow a similar protocol
to hw_breakpoint_slots(), such that if <asm/hw_breakpoint.h> defines
hw_breakpoint_weight(), we'll use it instead.

The result is that it is inlined and no longer shows up in profiles.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/hw_breakpoint.h | 1 -
 kernel/events/hw_breakpoint.c | 4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 78dd7035d1e5..9fa3547acd87 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -79,7 +79,6 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp);
 extern int dbg_release_bp_slot(struct perf_event *bp);
 extern int reserve_bp_slot(struct perf_event *bp);
 extern void release_bp_slot(struct perf_event *bp);
-int hw_breakpoint_weight(struct perf_event *bp);
 int arch_reserve_bp_slot(struct perf_event *bp);
 void arch_release_bp_slot(struct perf_event *bp);
 void arch_unregister_hw_breakpoint(struct perf_event *bp);
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 8e939723f27d..5f40c8dfa042 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -125,10 +125,12 @@ static __init int init_breakpoint_slots(void)
 }
 #endif
=20
-__weak int hw_breakpoint_weight(struct perf_event *bp)
+#ifndef hw_breakpoint_weight
+static inline int hw_breakpoint_weight(struct perf_event *bp)
 {
 	return 1;
 }
+#endif
=20
 static inline enum bp_type_idx find_slot_idx(u64 bp_type)
 {
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EFD3FC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:31:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S238622AbiFILbu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:31:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55328 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243892AbiFILbV (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:21 -0400
Received: from mail-ej1-x64a.google.com (mail-ej1-x64a.google.com
 [IPv6:2a00:1450:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 563C83A5BD9
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:13 -0700 (PDT)
Received: by mail-ej1-x64a.google.com with SMTP id
 t15-20020a1709066bcf00b0070dedeacb2cso7924512ejs.9
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=vEnwLHfz8bnnz/z0hnI1W7+EOk8HMNN4MOfns1570U0=;
        b=FJgcqRt48J5Dot/Fz9Uvm/E4iUfO5qVqzissd4gJ7h4TVLWQ6eMYxF/+mDWmUYehwt
         kaZAdhcbjK6DjQoIPEj2ovbsH7tZR4sidZog2AVEC5Pcbv0/Fd8zdz/xgkaMk7566qWW
         CtXnzZyoZZQCmfhPo2iHeESBHyL1ng2CcLJaeRINF4vimizadO6EST9hw6xvtREsPhiZ
         Au5ALp1erVQNvVDl2CLPDLUo19W9/msQUdrJR4bOwmgKmz89mxUyWVlCwhjQLU5BI3zY
         jeyj/ipnnM6TeFDdk1z2+RfKLNYkSEmH9XO71edFxAxxLc8Q/xUYJwLA992cBFhp+dn3
         3OPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=vEnwLHfz8bnnz/z0hnI1W7+EOk8HMNN4MOfns1570U0=;
        b=CE8FTvpbS37BTX60iiJCqMHaay0GBd2B76qrDdubI96ub36/GnvNU7TGxEt5y768He
         agr3p3MW5tGnYKR6iEl7OcWj6OPZ23YYoaonfIrN4ZvJjTV24GFU9wUvSCFIGFsARaz3
         RnggsUnYnYwJx/Ww2p459I1AGCcIwMMedBtu6yGciVNXQTcXC/i/JtjP1SPk/CmjDXW9
         2MTOcvRokzg2IQo2E1GTPxS3hV3IML9jC2tgfkNqTDoW5SYwNW3xSYXjPLXmxAx8Fxih
         DO9SLQNXVGBSmZI7/zFCq900w9S9QCY/UpzQsTbCVKHa1a06/Bl4slr1nB8QY9n8/0a1
         ASqw==
X-Gm-Message-State: AOAM531h9jHkXceb8WUQ/m3XvptBz6fqfVF1qrOoGvuXYTkbkTon2crt
        iAge0cXdbHEPvArkXZjRZjmr1dwCUQ==
X-Google-Smtp-Source: 
 ABdhPJy6fxgLEEWOBBqF03SqL1tlVIBlJ9qpMkrzeULfbi/9xejgkRe26FJME5xe2MXNniUvCKkw5WpXAA==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:a17:906:8513:b0:711:c67f:62b6 with SMTP id
 i19-20020a170906851300b00711c67f62b6mr21303657ejx.657.1654774271337; Thu, 09
 Jun 2022 04:31:11 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:43 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-6-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 5/8] perf/hw_breakpoint: Remove useless code related to
 flexible breakpoints
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Flexible breakpoints have never been implemented, with
bp_cpuinfo::flexible always being 0. Unfortunately, they still occupy 4
bytes in each bp_cpuinfo and bp_busy_slots, as well as computing the max
flexible count in fetch_bp_busy_slots().

This again causes suboptimal code generation, when we always know that
`!!slots.flexible` will be 0.

Just get rid of the flexible "placeholder" and remove all real code
related to it. Make a note in the comment related to the constraints
algorithm but don't remove them from the algorithm, so that if in future
flexible breakpoints need supporting, it should be trivial to revive
them (along with reverting this change).

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
---
 kernel/events/hw_breakpoint.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 5f40c8dfa042..afe0a6007e96 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -46,8 +46,6 @@ struct bp_cpuinfo {
 #else
 	unsigned int	*tsk_pinned;
 #endif
-	/* Number of non-pinned cpu/task breakpoints in a cpu */
-	unsigned int	flexible; /* XXX: placeholder, see fetch_this_slot() */
 };
=20
 static DEFINE_PER_CPU(struct bp_cpuinfo, bp_cpuinfo[TYPE_MAX]);
@@ -71,7 +69,6 @@ static bool constraints_initialized __ro_after_init;
 /* Gather the number of total pinned and un-pinned bp in a cpuset */
 struct bp_busy_slots {
 	unsigned int pinned;
-	unsigned int flexible;
 };
=20
 /* Serialize accesses to the above constraints */
@@ -213,10 +210,6 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struc=
t perf_event *bp,
=20
 		if (nr > slots->pinned)
 			slots->pinned =3D nr;
-
-		nr =3D info->flexible;
-		if (nr > slots->flexible)
-			slots->flexible =3D nr;
 	}
 }
=20
@@ -299,7 +292,8 @@ __weak void arch_unregister_hw_breakpoint(struct perf_e=
vent *bp)
 }
=20
 /*
- * Constraints to check before allowing this new breakpoint counter:
+ * Constraints to check before allowing this new breakpoint counter. Note =
that
+ * flexible breakpoints are currently unsupported -- see fetch_this_slot().
  *
  *  =3D=3D Non-pinned counter =3D=3D (Considered as pinned for now)
  *
@@ -366,7 +360,7 @@ static int __reserve_bp_slot(struct perf_event *bp, u64=
 bp_type)
 	fetch_this_slot(&slots, weight);
=20
 	/* Flexible counters need to keep at least one slot */
-	if (slots.pinned + (!!slots.flexible) > hw_breakpoint_slots_cached(type))
+	if (slots.pinned > hw_breakpoint_slots_cached(type))
 		return -ENOSPC;
=20
 	ret =3D arch_reserve_bp_slot(bp);
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 47361C43334
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:32:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235091AbiFILb5 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:31:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54342 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S242751AbiFILbk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:40 -0400
Received: from mail-ej1-x649.google.com (mail-ej1-x649.google.com
 [IPv6:2a00:1450:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A4263A7818
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:14 -0700 (PDT)
Received: by mail-ej1-x649.google.com with SMTP id
 s4-20020a170906500400b006feaccb3a0eso10875587ejj.11
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=Qrang/3zbd627ej4K3aX6CJRiMSrjsrxCenTVL581fY=;
        b=hrB6Z1QRdDU0UqTboVSjaTVcATPzy0QpgNxEs/1+5YavbUt9jMqUlUykVw+6gwVInq
         B+jlmyCEPPUrs2qC/HYc5oOQqdgpc3cGLxZZcbwF/E8edhdi/uZ7ay7BY5Od9+AzQuew
         vQ+x/b/Igy/b5gSJodaQSRf1NuLasDGU/RH8gh+7Unygpqj0zroRLCKC2iO9AGrtfP4z
         msdffUtLMCP7fbntRAM7p4WBG0nD8rTLw6OG/y9FUwL/v7skFehJOE9VF0wnHwApMqrE
         YO3jZX0VabvW1NuqifOcGMZ5uSWz2zzfsZMqXAQ4bb8QFPCDqL3JFGMP5e10oz6Cs6OU
         xggA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=Qrang/3zbd627ej4K3aX6CJRiMSrjsrxCenTVL581fY=;
        b=FJDviHcS/Meqwtw/iE3+4zeEAiS7bMDF1vc+k/pzN+e3xfPUEu5BlI0UDdMTzrHVU8
         Thcv1I6/m1Cq/GjH1uArVmC9xrULqpNyp1I1lS+WWQcSmIvF0jdGwdtD0Vv+atX6ocVg
         XDVgZj7btjfHYqKpi0Tf0bL95GcRegWzL1wWvlBEZsB6PpSyPjkE6Z2uN8zwJelh6aCR
         fgPoYv42e2ExAdA9zyIU51ymaZRB9A5Kf3+NUhU1KveOWpj5UPQKR/cF0eH2CK/2jQBc
         eWQEm3oifA9c+vxzNuPdpL8eEBY/homgxl70zm89mSmxbZzhbHi3codyamzeLiZCUzW8
         F4uQ==
X-Gm-Message-State: AOAM5301cNFUUrHnE79l56GB3I1Uar4EE+svCleuCysfSL5kK+ZaibUa
        Qbpk/iw/pQWAlZG5bhXL4I5YWs9SGg==
X-Google-Smtp-Source: 
 ABdhPJwjbG9fLgFAnR6dcJQqcMlDxZyFZjqybQveBM48Z8sSIBdT/nh0HvNMJXmPNUYAznRgKVa/OT/itw==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:aa7:c508:0:b0:42d:cc6b:df80 with SMTP id
 o8-20020aa7c508000000b0042dcc6bdf80mr44225331edq.393.1654774274424; Thu, 09
 Jun 2022 04:31:14 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:44 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-7-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 6/8] perf/hw_breakpoint: Reduce contention with large number
 of tasks
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

While optimizing task_bp_pinned()'s runtime complexity to O(1) on
average helps reduce time spent in the critical section, we still suffer
due to serializing everything via 'nr_bp_mutex'. Indeed, a profile shows
that now contention is the biggest issue:

    95.93%  [kernel]       [k] osq_lock
     0.70%  [kernel]       [k] mutex_spin_on_owner
     0.22%  [kernel]       [k] smp_cfm_core_cond
     0.18%  [kernel]       [k] task_bp_pinned
     0.18%  [kernel]       [k] rhashtable_jhash2
     0.15%  [kernel]       [k] queued_spin_lock_slowpath

when running the breakpoint benchmark with (system with 256 CPUs):

 | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 4 breakpoints and 64 parallelism
 |      Total time: 0.207 [sec]
 |
 |      108.267188 usecs/op
 |     6929.100000 usecs/op/cpu

The main concern for synchronizing the breakpoint constraints data is
that a consistent snapshot of the per-CPU and per-task data is observed.

The access pattern is as follows:

 1. If the target is a task: the task's pinned breakpoints are counted,
    checked for space, and then appended to; only bp_cpuinfo::cpu_pinned
    is used to check for conflicts with CPU-only breakpoints;
    bp_cpuinfo::tsk_pinned are incremented/decremented, but otherwise
    unused.

 2. If the target is a CPU: bp_cpuinfo::cpu_pinned are counted, along
    with bp_cpuinfo::tsk_pinned; after a successful check, cpu_pinned is
    incremented. No per-task breakpoints are checked.

Since rhltable safely synchronizes insertions/deletions, we can allow
concurrency as follows:

 1. If the target is a task: independent tasks may update and check the
    constraints concurrently, but same-task target calls need to be
    serialized; since bp_cpuinfo::tsk_pinned is only updated, but not
    checked, these modifications can happen concurrently by switching
    tsk_pinned to atomic_t.

 2. If the target is a CPU: access to the per-CPU constraints needs to
    be serialized with other CPU-target and task-target callers (to
    stabilize the bp_cpuinfo::tsk_pinned snapshot).

We can allow the above concurrency by introducing a per-CPU constraints
data reader-writer lock (bp_cpuinfo_lock), and per-task mutexes
(task_sharded_mtx):

  1. If the target is a task: acquires its task_sharded_mtx, and
     acquires bp_cpuinfo_lock as a reader.

  2. If the target is a CPU: acquires bp_cpuinfo_lock as a writer.

With these changes, contention with thousands of tasks is reduced to the
point where waiting on locking no longer dominates the profile:

 | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 4 breakpoints and 64 parallelism
 |      Total time: 0.080 [sec]
 |
 |       42.048437 usecs/op
 |     2691.100000 usecs/op/cpu

    21.31%  [kernel]       [k] task_bp_pinned
    17.49%  [kernel]       [k] rhashtable_jhash2
     5.29%  [kernel]       [k] toggle_bp_slot
     4.45%  [kernel]       [k] mutex_spin_on_owner
     3.72%  [kernel]       [k] bcmp

On this particular setup that's a speedup of 2.5x.

We're also getting closer to the theoretical ideal performance through
optimizations in hw_breakpoint.c -- constraints accounting disabled:

 | perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 4 breakpoints and 64 parallelism
 |      Total time: 0.067 [sec]
 |
 |       35.286458 usecs/op
 |     2258.333333 usecs/op/cpu

Which means the current implementation is ~19% slower than the
theoretical ideal.

For reference, performance without any breakpoints:

 | $> bench -r 30 breakpoint thread -b 0 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 0 breakpoints and 64 parallelism
 |      Total time: 0.060 [sec]
 |
 |       31.365625 usecs/op
 |     2007.400000 usecs/op/cpu

The theoretical ideal is only ~12% slower than no breakpoints at all.
The current implementation is ~34% slower than no breakpoints at all.
(On a system with 256 CPUs.)

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/events/hw_breakpoint.c | 155 ++++++++++++++++++++++++++++------
 1 file changed, 128 insertions(+), 27 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index afe0a6007e96..08c9ed0626e4 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -17,6 +17,7 @@
  * This file contains the arch-independent routines.
  */
=20
+#include <linux/atomic.h>
 #include <linux/irqflags.h>
 #include <linux/kallsyms.h>
 #include <linux/notifier.h>
@@ -24,8 +25,10 @@
 #include <linux/kdebug.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/mutex.h>
 #include <linux/percpu.h>
 #include <linux/sched.h>
+#include <linux/spinlock.h>
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/rhashtable.h>
@@ -42,9 +45,9 @@ struct bp_cpuinfo {
 	unsigned int	cpu_pinned;
 	/* tsk_pinned[n] is the number of tasks having n+1 breakpoints */
 #ifdef hw_breakpoint_slots
-	unsigned int	tsk_pinned[hw_breakpoint_slots(0)];
+	atomic_t	tsk_pinned[hw_breakpoint_slots(0)];
 #else
-	unsigned int	*tsk_pinned;
+	atomic_t	*tsk_pinned;
 #endif
 };
=20
@@ -71,8 +74,81 @@ struct bp_busy_slots {
 	unsigned int pinned;
 };
=20
-/* Serialize accesses to the above constraints */
-static DEFINE_MUTEX(nr_bp_mutex);
+/*
+ * Synchronizes accesses to the per-CPU constraints; users of data in bp_c=
puinfo
+ * must acquire bp_cpuinfo_lock as writer to get a stable snapshot of all =
CPUs'
+ * constraints. Modifications without use may only acquire bp_cpuinfo_lock=
 as a
+ * reader, but must otherwise ensure modifications are never lost.
+ */
+static DEFINE_RWLOCK(bp_cpuinfo_lock);
+
+/*
+ * Synchronizes accesses to the per-task breakpoint list in task_bps_ht. S=
ince
+ * rhltable synchronizes concurrent insertions/deletions, independent task=
s may
+ * insert/delete concurrently; therefore, a mutex per task would be suffic=
ient.
+ *
+ * To avoid bloating task_struct with infrequently used data, use a sharded
+ * mutex that scales with number of CPUs.
+ */
+static DEFINE_PER_CPU(struct mutex, task_sharded_mtx);
+
+static struct mutex *get_task_sharded_mtx(struct perf_event *bp)
+{
+	int shard;
+
+	if (!bp->hw.target)
+		return NULL;
+
+	/*
+	 * Compute a valid shard index into per-CPU data.
+	 */
+	shard =3D task_pid_nr(bp->hw.target) % nr_cpu_ids;
+	shard =3D cpumask_next(shard - 1, cpu_possible_mask);
+	if (shard >=3D nr_cpu_ids)
+		shard =3D cpumask_first(cpu_possible_mask);
+
+	return per_cpu_ptr(&task_sharded_mtx, shard);
+}
+
+static struct mutex *bp_constraints_lock(struct perf_event *bp)
+{
+	struct mutex *mtx =3D get_task_sharded_mtx(bp);
+
+	if (mtx) {
+		mutex_lock(mtx);
+		read_lock(&bp_cpuinfo_lock);
+	} else {
+		write_lock(&bp_cpuinfo_lock);
+	}
+
+	return mtx;
+}
+
+static void bp_constraints_unlock(struct mutex *mtx)
+{
+	if (mtx) {
+		read_unlock(&bp_cpuinfo_lock);
+		mutex_unlock(mtx);
+	} else {
+		write_unlock(&bp_cpuinfo_lock);
+	}
+}
+
+static bool bp_constraints_is_locked(struct perf_event *bp)
+{
+	struct mutex *mtx =3D get_task_sharded_mtx(bp);
+
+	return (mtx ? mutex_is_locked(mtx) : false) ||
+	       rwlock_is_contended(&bp_cpuinfo_lock);
+}
+
+static inline void assert_bp_constraints_lock_held(struct perf_event *bp)
+{
+	lockdep_assert_held(&bp_cpuinfo_lock);
+	/* Don't call get_task_sharded_mtx() if lockdep is disabled. */
+	if (IS_ENABLED(CONFIG_LOCKDEP) && bp->hw.target)
+		lockdep_assert_held(get_task_sharded_mtx(bp));
+}
=20
 #ifdef hw_breakpoint_slots
 /*
@@ -103,7 +179,7 @@ static __init int init_breakpoint_slots(void)
 		for (i =3D 0; i < TYPE_MAX; i++) {
 			struct bp_cpuinfo *info =3D get_bp_info(cpu, i);
=20
-			info->tsk_pinned =3D kcalloc(__nr_bp_slots[i], sizeof(int), GFP_KERNEL);
+			info->tsk_pinned =3D kcalloc(__nr_bp_slots[i], sizeof(atomic_t), GFP_KE=
RNEL);
 			if (!info->tsk_pinned)
 				goto err;
 		}
@@ -143,11 +219,19 @@ static inline enum bp_type_idx find_slot_idx(u64 bp_t=
ype)
  */
 static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
 {
-	unsigned int *tsk_pinned =3D get_bp_info(cpu, type)->tsk_pinned;
+	atomic_t *tsk_pinned =3D get_bp_info(cpu, type)->tsk_pinned;
 	int i;
=20
+	/*
+	 * At this point we want to have acquired the bp_cpuinfo_lock as a
+	 * writer to ensure that there are no concurrent writers in
+	 * toggle_bp_task_slot() to tsk_pinned, and we get a stable snapshot.
+	 */
+	lockdep_assert_held_write(&bp_cpuinfo_lock);
+
 	for (i =3D hw_breakpoint_slots_cached(type) - 1; i >=3D 0; i--) {
-		if (tsk_pinned[i] > 0)
+		ASSERT_EXCLUSIVE_WRITER(tsk_pinned[i]); /* Catch unexpected writers. */
+		if (atomic_read(&tsk_pinned[i]) > 0)
 			return i + 1;
 	}
=20
@@ -164,6 +248,11 @@ static int task_bp_pinned(int cpu, struct perf_event *=
bp, enum bp_type_idx type)
 	struct perf_event *iter;
 	int count =3D 0;
=20
+	/*
+	 * We need a stable snapshot of the per-task breakpoint list.
+	 */
+	assert_bp_constraints_lock_held(bp);
+
 	rcu_read_lock();
 	head =3D rhltable_lookup(&task_bps_ht, &bp->hw.target, task_bps_ht_params=
);
 	if (!head)
@@ -230,16 +319,25 @@ fetch_this_slot(struct bp_busy_slots *slots, int weig=
ht)
 static void toggle_bp_task_slot(struct perf_event *bp, int cpu,
 				enum bp_type_idx type, int weight)
 {
-	unsigned int *tsk_pinned =3D get_bp_info(cpu, type)->tsk_pinned;
+	atomic_t *tsk_pinned =3D get_bp_info(cpu, type)->tsk_pinned;
 	int old_idx, new_idx;
=20
+	/*
+	 * If bp->hw.target, tsk_pinned is only modified, but not used
+	 * otherwise. We can permit concurrent updates as long as there are no
+	 * other uses: having acquired bp_cpuinfo_lock as a reader allows
+	 * concurrent updates here. Uses of tsk_pinned will require acquiring
+	 * bp_cpuinfo_lock as a writer to stabilize tsk_pinned's value.
+	 */
+	lockdep_assert_held_read(&bp_cpuinfo_lock);
+
 	old_idx =3D task_bp_pinned(cpu, bp, type) - 1;
 	new_idx =3D old_idx + weight;
=20
 	if (old_idx >=3D 0)
-		tsk_pinned[old_idx]--;
+		atomic_dec(&tsk_pinned[old_idx]);
 	if (new_idx >=3D 0)
-		tsk_pinned[new_idx]++;
+		atomic_inc(&tsk_pinned[new_idx]);
 }
=20
 /*
@@ -257,6 +355,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum=
 bp_type_idx type,
=20
 	/* Pinned counter cpu profiling */
 	if (!bp->hw.target) {
+		lockdep_assert_held_write(&bp_cpuinfo_lock);
 		get_bp_info(bp->cpu, type)->cpu_pinned +=3D weight;
 		return 0;
 	}
@@ -265,6 +364,11 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enu=
m bp_type_idx type,
 	for_each_cpu(cpu, cpumask)
 		toggle_bp_task_slot(bp, cpu, type, weight);
=20
+	/*
+	 * Readers want a stable snapshot of the per-task breakpoint list.
+	 */
+	assert_bp_constraints_lock_held(bp);
+
 	if (enable)
 		return rhltable_insert(&task_bps_ht, &bp->hw.bp_list, task_bps_ht_params=
);
 	else
@@ -372,14 +476,10 @@ static int __reserve_bp_slot(struct perf_event *bp, u=
64 bp_type)
=20
 int reserve_bp_slot(struct perf_event *bp)
 {
-	int ret;
-
-	mutex_lock(&nr_bp_mutex);
-
-	ret =3D __reserve_bp_slot(bp, bp->attr.bp_type);
-
-	mutex_unlock(&nr_bp_mutex);
+	struct mutex *mtx =3D bp_constraints_lock(bp);
+	int ret =3D __reserve_bp_slot(bp, bp->attr.bp_type);
=20
+	bp_constraints_unlock(mtx);
 	return ret;
 }
=20
@@ -397,12 +497,11 @@ static void __release_bp_slot(struct perf_event *bp, =
u64 bp_type)
=20
 void release_bp_slot(struct perf_event *bp)
 {
-	mutex_lock(&nr_bp_mutex);
+	struct mutex *mtx =3D bp_constraints_lock(bp);
=20
 	arch_unregister_hw_breakpoint(bp);
 	__release_bp_slot(bp, bp->attr.bp_type);
-
-	mutex_unlock(&nr_bp_mutex);
+	bp_constraints_unlock(mtx);
 }
=20
 static int __modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_t=
ype)
@@ -429,11 +528,10 @@ static int __modify_bp_slot(struct perf_event *bp, u6=
4 old_type, u64 new_type)
=20
 static int modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_typ=
e)
 {
-	int ret;
+	struct mutex *mtx =3D bp_constraints_lock(bp);
+	int ret =3D __modify_bp_slot(bp, old_type, new_type);
=20
-	mutex_lock(&nr_bp_mutex);
-	ret =3D __modify_bp_slot(bp, old_type, new_type);
-	mutex_unlock(&nr_bp_mutex);
+	bp_constraints_unlock(mtx);
 	return ret;
 }
=20
@@ -444,7 +542,7 @@ static int modify_bp_slot(struct perf_event *bp, u64 ol=
d_type, u64 new_type)
  */
 int dbg_reserve_bp_slot(struct perf_event *bp)
 {
-	if (mutex_is_locked(&nr_bp_mutex))
+	if (bp_constraints_is_locked(bp))
 		return -1;
=20
 	return __reserve_bp_slot(bp, bp->attr.bp_type);
@@ -452,7 +550,7 @@ int dbg_reserve_bp_slot(struct perf_event *bp)
=20
 int dbg_release_bp_slot(struct perf_event *bp)
 {
-	if (mutex_is_locked(&nr_bp_mutex))
+	if (bp_constraints_is_locked(bp))
 		return -1;
=20
 	__release_bp_slot(bp, bp->attr.bp_type);
@@ -735,7 +833,10 @@ static struct pmu perf_breakpoint =3D {
=20
 int __init init_hw_breakpoint(void)
 {
-	int ret;
+	int cpu, ret;
+
+	for_each_possible_cpu(cpu)
+		mutex_init(&per_cpu(task_sharded_mtx, cpu));
=20
 	ret =3D rhltable_init(&task_bps_ht, &task_bps_ht_params);
 	if (ret)
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 980A7C43334
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:32:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233935AbiFILcF (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:32:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56992 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243176AbiFILbk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:40 -0400
Received: from mail-ed1-x54a.google.com (mail-ed1-x54a.google.com
 [IPv6:2a00:1450:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE6A43A79FB
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:18 -0700 (PDT)
Received: by mail-ed1-x54a.google.com with SMTP id
 ee46-20020a056402292e00b0042dd4d6054dso16833221edb.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=WcGBHtbaqL7btwnwCehWzC9DdtYID6QXS6Q+AzJHJpo=;
        b=FUHqgrymmD/1yS6eajWNPNkLHwczgi647YPiMvdMd8wKh4/2sjqiaRXlAA620pm4Mv
         xL9r1xgoQJCeRUF1AfHW4m3wwQ8dBMPuJAP+MS3Gf5Y4Q60vW28etT5n4Qhzynpw2KlW
         SSVheyu7eK0bXuB94IqvRRPEZWUAGiG94P9UljvrtyMvC/mSSOEArG+bKmndPV4CGitO
         wDBW7o2Agqcv1HgI7082Kz0V0EtABzWdXhs7xBTwIj93xRyEar5DBQPibE84GDP4kT4l
         aCFjysDsm/A75pLI9BoCvtbcKb45WhhYACi7rL+A8fAmZPWBVhJtRTNOPQ0mx7QAsnKn
         2JAA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=WcGBHtbaqL7btwnwCehWzC9DdtYID6QXS6Q+AzJHJpo=;
        b=S11YkB/SLLe3wxdOJqahmG66odIijCRWEyNdDm0Xy/pGrmZMrud6L245i9GcYofzRD
         ClKfRXuEasWdUK5STnOzHz+wOGA8thh/1MYzA37JJ6YVBV8ysDWrT9+dUurq2NsspY+i
         dFUds9dHc2fjs5Rmcnc9BSvUDOmhy4AGVIlv8R1WcYDcgX7tMK0NrWhRLLYpBWmW70dd
         0oKPdOKLm3XJKt2DguhX/AgiEiT4nHjS18jsulVIv6p8rMFZn1gOEhYv5BJVtspm0V+r
         1u3cJ32F+6FFoMslLrXm3SBXKbguB4LrjIUR2vIqM3fns/hXaGmc8NWgjGf6eunk9lF4
         vgng==
X-Gm-Message-State: AOAM53153UIF7oHOi39E1bxKRS459RHH4Hajk/WB3eaKR4kuV82C0oYk
        uej+M19O61iq7GtV8a4OrKZ9p/lCyg==
X-Google-Smtp-Source: 
 ABdhPJxvw8djj03eIGJz3wC6dhNkuY0kMcjHQnolA0VMoUo95H0kQsdBeZko+WfPF714kuHSsxB//Ounew==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:aa7:c306:0:b0:42d:d4cc:c606 with SMTP id
 l6-20020aa7c306000000b0042dd4ccc606mr44735253edq.341.1654774277090; Thu, 09
 Jun 2022 04:31:17 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:45 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-8-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 7/8] perf/hw_breakpoint: Optimize task_bp_pinned() if
 CPU-independent
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Running the perf benchmark with (note: more aggressive parameters vs.
preceding changes, but same host with 256 CPUs):

 | $> perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 100 threads with 4 breakpoints and 128 parallelism
 |      Total time: 1.953 [sec]
 |
 |       38.146289 usecs/op
 |     4882.725000 usecs/op/cpu

    16.29%  [kernel]       [k] rhashtable_jhash2
    16.19%  [kernel]       [k] osq_lock
    14.22%  [kernel]       [k] queued_spin_lock_slowpath
     8.58%  [kernel]       [k] task_bp_pinned
     8.30%  [kernel]       [k] mutex_spin_on_owner
     4.03%  [kernel]       [k] smp_cfm_core_cond
     2.97%  [kernel]       [k] toggle_bp_slot
     2.94%  [kernel]       [k] bcmp

We can see that a majority of the time is now spent hashing task
pointers to index into task_bps_ht in task_bp_pinned().

However, if task_bp_pinned()'s computation is independent of any CPU,
i.e. always `iter->cpu < 0`, the result for each invocation will be
identical. With increasing CPU-count, this problem worsens.

Instead, identify if every call to task_bp_pinned() is CPU-independent,
and cache the result. Use the cached result instead of a call to
task_bp_pinned(), now __task_bp_pinned(), with task_bp_pinned() deciding
if the cached result can be used.

After this optimization:

    21.96%  [kernel]       [k] queued_spin_lock_slowpath
    16.39%  [kernel]       [k] osq_lock
     9.82%  [kernel]       [k] toggle_bp_slot
     9.81%  [kernel]       [k] find_next_bit
     4.93%  [kernel]       [k] mutex_spin_on_owner
     4.71%  [kernel]       [k] smp_cfm_core_cond
     4.30%  [kernel]       [k] __reserve_bp_slot
     2.65%  [kernel]       [k] cpumask_next

Showing that the time spent hashing keys has become insignificant.

With the given benchmark parameters, however, we see no statistically
significant improvement in performance on the test system with 256 CPUs.
This is very likely due to the benchmark parameters being too aggressive
and contention elsewhere becoming dominant.

Indeed, when using the less aggressive parameters from the preceding
changes, we now observe:

 | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
 | # Running 'breakpoint/thread' benchmark:
 | # Created/joined 30 threads with 4 breakpoints and 64 parallelism
 |      Total time: 0.071 [sec]
 |
 |       37.134896 usecs/op
 |     2376.633333 usecs/op/cpu

Which is an improvement of 12% compared to without this optimization
(baseline is 42 usecs/op). This is now only 5% slower than the
theoretical ideal (constraints disabled), and 18% slower than no
breakpoints at all.

[ While we're here, swap task_bp_pinned()'s bp and cpu arguments to be
  more consistent with other functions (which have bp first, before the
  cpu argument). ]

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/events/hw_breakpoint.c | 71 +++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 19 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 08c9ed0626e4..3b33a4075104 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -242,11 +242,22 @@ static unsigned int max_task_bp_pinned(int cpu, enum =
bp_type_idx type)
  * Count the number of breakpoints of the same type and same task.
  * The given event must be not on the list.
  */
-static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx=
 type)
+struct task_bp_pinned {
+	/*
+	 * If @cpu_independent is true, we can avoid calling __task_bp_pinned()
+	 * for each CPU, since @count will be the same for each invocation.
+	 */
+	bool cpu_independent;
+	int count;
+	struct perf_event *bp;
+	enum bp_type_idx type;
+};
+static struct task_bp_pinned
+__task_bp_pinned(struct perf_event *bp, int cpu, enum bp_type_idx type)
 {
+	struct task_bp_pinned ret =3D {true, 0, bp, type};
 	struct rhlist_head *head, *pos;
 	struct perf_event *iter;
-	int count =3D 0;
=20
 	/*
 	 * We need a stable snapshot of the per-task breakpoint list.
@@ -259,14 +270,33 @@ static int task_bp_pinned(int cpu, struct perf_event =
*bp, enum bp_type_idx type)
 		goto out;
=20
 	rhl_for_each_entry_rcu(iter, pos, head, hw.bp_list) {
-		if (find_slot_idx(iter->attr.bp_type) =3D=3D type &&
-		    (iter->cpu < 0 || cpu =3D=3D iter->cpu))
-			count +=3D hw_breakpoint_weight(iter);
+		if (find_slot_idx(iter->attr.bp_type) =3D=3D type) {
+			if (iter->cpu >=3D 0) {
+				ret.cpu_independent =3D false;
+				if (cpu !=3D iter->cpu)
+					continue;
+			}
+			ret.count +=3D hw_breakpoint_weight(iter);
+		}
 	}
=20
 out:
 	rcu_read_unlock();
-	return count;
+	return ret;
+}
+
+static int
+task_bp_pinned(struct perf_event *bp, int cpu, enum bp_type_idx type,
+	       struct task_bp_pinned *cached_tbp_pinned)
+{
+	if (cached_tbp_pinned->cpu_independent) {
+		assert_bp_constraints_lock_held(bp);
+		if (!WARN_ON(cached_tbp_pinned->bp !=3D bp || cached_tbp_pinned->type !=
=3D type))
+			return cached_tbp_pinned->count;
+	}
+
+	*cached_tbp_pinned =3D __task_bp_pinned(bp, cpu, type);
+	return cached_tbp_pinned->count;
 }
=20
 static const struct cpumask *cpumask_of_bp(struct perf_event *bp)
@@ -281,8 +311,8 @@ static const struct cpumask *cpumask_of_bp(struct perf_=
event *bp)
  * a given cpu (cpu > -1) or in all of them (cpu =3D -1).
  */
 static void
-fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
-		    enum bp_type_idx type)
+fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp, en=
um bp_type_idx type,
+		    struct task_bp_pinned *cached_tbp_pinned)
 {
 	const struct cpumask *cpumask =3D cpumask_of_bp(bp);
 	int cpu;
@@ -295,7 +325,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct=
 perf_event *bp,
 		if (!bp->hw.target)
 			nr +=3D max_task_bp_pinned(cpu, type);
 		else
-			nr +=3D task_bp_pinned(cpu, bp, type);
+			nr +=3D task_bp_pinned(bp, cpu, type, cached_tbp_pinned);
=20
 		if (nr > slots->pinned)
 			slots->pinned =3D nr;
@@ -314,10 +344,11 @@ fetch_this_slot(struct bp_busy_slots *slots, int weig=
ht)
 }
=20
 /*
- * Add a pinned breakpoint for the given task in our constraint table
+ * Add a pinned breakpoint for the given task in our constraint table.
  */
-static void toggle_bp_task_slot(struct perf_event *bp, int cpu,
-				enum bp_type_idx type, int weight)
+static void
+toggle_bp_task_slot(struct perf_event *bp, int cpu, enum bp_type_idx type,=
 int weight,
+		    struct task_bp_pinned *cached_tbp_pinned)
 {
 	atomic_t *tsk_pinned =3D get_bp_info(cpu, type)->tsk_pinned;
 	int old_idx, new_idx;
@@ -331,7 +362,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, =
int cpu,
 	 */
 	lockdep_assert_held_read(&bp_cpuinfo_lock);
=20
-	old_idx =3D task_bp_pinned(cpu, bp, type) - 1;
+	old_idx =3D task_bp_pinned(bp, cpu, type, cached_tbp_pinned) - 1;
 	new_idx =3D old_idx + weight;
=20
 	if (old_idx >=3D 0)
@@ -341,11 +372,11 @@ static void toggle_bp_task_slot(struct perf_event *bp=
, int cpu,
 }
=20
 /*
- * Add/remove the given breakpoint in our constraint table
+ * Add/remove the given breakpoint in our constraint table.
  */
 static int
 toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
-	       int weight)
+	       int weight, struct task_bp_pinned *cached_tbp_pinned)
 {
 	const struct cpumask *cpumask =3D cpumask_of_bp(bp);
 	int cpu;
@@ -362,7 +393,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum=
 bp_type_idx type,
=20
 	/* Pinned counter task profiling */
 	for_each_cpu(cpu, cpumask)
-		toggle_bp_task_slot(bp, cpu, type, weight);
+		toggle_bp_task_slot(bp, cpu, type, weight, cached_tbp_pinned);
=20
 	/*
 	 * Readers want a stable snapshot of the per-task breakpoint list.
@@ -439,6 +470,7 @@ __weak void arch_unregister_hw_breakpoint(struct perf_e=
vent *bp)
  */
 static int __reserve_bp_slot(struct perf_event *bp, u64 bp_type)
 {
+	struct task_bp_pinned cached_tbp_pinned =3D {};
 	struct bp_busy_slots slots =3D {0};
 	enum bp_type_idx type;
 	int weight;
@@ -456,7 +488,7 @@ static int __reserve_bp_slot(struct perf_event *bp, u64=
 bp_type)
 	type =3D find_slot_idx(bp_type);
 	weight =3D hw_breakpoint_weight(bp);
=20
-	fetch_bp_busy_slots(&slots, bp, type);
+	fetch_bp_busy_slots(&slots, bp, type, &cached_tbp_pinned);
 	/*
 	 * Simulate the addition of this breakpoint to the constraints
 	 * and see the result.
@@ -471,7 +503,7 @@ static int __reserve_bp_slot(struct perf_event *bp, u64=
 bp_type)
 	if (ret)
 		return ret;
=20
-	return toggle_bp_slot(bp, true, type, weight);
+	return toggle_bp_slot(bp, true, type, weight, &cached_tbp_pinned);
 }
=20
 int reserve_bp_slot(struct perf_event *bp)
@@ -485,6 +517,7 @@ int reserve_bp_slot(struct perf_event *bp)
=20
 static void __release_bp_slot(struct perf_event *bp, u64 bp_type)
 {
+	struct task_bp_pinned cached_tbp_pinned =3D {};
 	enum bp_type_idx type;
 	int weight;
=20
@@ -492,7 +525,7 @@ static void __release_bp_slot(struct perf_event *bp, u6=
4 bp_type)
=20
 	type =3D find_slot_idx(bp_type);
 	weight =3D hw_breakpoint_weight(bp);
-	WARN_ON(toggle_bp_slot(bp, false, type, weight));
+	WARN_ON(toggle_bp_slot(bp, false, type, weight, &cached_tbp_pinned));
 }
=20
 void release_bp_slot(struct perf_event *bp)
--=20
2.36.1.255.ge46751e96f-goog
From nobody Wed Apr 29 09:34:26 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 98DB1C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  9 Jun 2022 11:32:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234695AbiFILcc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 9 Jun 2022 07:32:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243974AbiFILbq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Jun 2022 07:31:46 -0400
Received: from mail-ej1-x649.google.com (mail-ej1-x649.google.com
 [IPv6:2a00:1450:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A6B33A81D9
        for <linux-kernel@vger.kernel.org>;
 Thu,  9 Jun 2022 04:31:21 -0700 (PDT)
Received: by mail-ej1-x649.google.com with SMTP id
 hy20-20020a1709068a7400b00703779e6f2fso10896334ejc.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 09 Jun 2022 04:31:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=date:in-reply-to:message-id:mime-version:references:subject:from:to
         :cc;
        bh=bycUJydp4Is1Pgrv0uqBEliLMdGvYRKoN6kNtR19/dc=;
        b=lPOLFAzaVOjO2E4jOt3ajdFnigRsm9W5V8NrkrH81hFiV+UvGpy8BzgrUxJPkgesV4
         JHfrN4FpaAFLyyF0K+5ILXw4du2srmoNDODBSReZ0dqEEMsNlyUx+J9sOcdv336zaj8D
         WtlzYclXrCEY/MRa3dkhE2ArEhpYIFs2bTDiQSwNLa7qv+7KeLZZeav+jW98O39oDjWB
         WtWGIaUqWlr7k6ZOsucAdEX4WJ91Gr2hDXMGDLo7jig6TjQ07gN9+YmpNhuNNqX464hF
         9DIPwAlg1zMIralWSr504/4Z95jF1S2ilpCpYRzXhkFGJ9iY1e5Cgdy0HjmtbVQ3fQCF
         oEXA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:date:in-reply-to:message-id:mime-version
         :references:subject:from:to:cc;
        bh=bycUJydp4Is1Pgrv0uqBEliLMdGvYRKoN6kNtR19/dc=;
        b=R81XUb+2QnWrnojgt4Ox5taik3Bxun96g4HSC0XzC1CuTvXXORK+YRH0DqYL+pKbmT
         JuV2hNyutduv5Y743crDDj3YauUTnLJzoy+kqAcrOirM3Fp6y5k0ocWLOox60Kmo/4JT
         lCgm6sgGIlAcQ6Rk85235SVCyI6yPSU/ml1xt7/5zfg+YYX3hYgWPkv+l5minD9n+F8q
         3nkOYC3z30e4YE8EgngtnQ2DXwXme+vlTEWtkpAQreWVtESLi15HNW2sJSuAiG7kFKu8
         9xolHw6sroOtk4JmMNHQqMczn7AgFR9S0BC/FL6fu9BQEWOzWqRJYrby5G9ePAJOJlXl
         HEMQ==
X-Gm-Message-State: AOAM530XzHeaC4JBod4Ntce2PUoz/CzWXUJEOQoSkPfHIm8SNXrhUYHb
        NyECsYVBJhGMJVuG1BS/Ic8auwszBQ==
X-Google-Smtp-Source: 
 ABdhPJydpZCykPcdaFY2rxjM84Lwhm+BapZzdgw9pPl1PPzpnynWUqwblyylyqmpNsUsmyb5P8OdnbA0Lg==
X-Received: from elver.muc.corp.google.com
 ([2a00:79e0:9c:201:dcf:e5ba:10a5:1ea5])
 (user=elver job=sendgmr) by 2002:a05:6402:11:b0:431:680c:cca1 with SMTP id
 d17-20020a056402001100b00431680ccca1mr22174051edu.420.1654774279801; Thu, 09
 Jun 2022 04:31:19 -0700 (PDT)
Date: Thu,  9 Jun 2022 13:30:46 +0200
In-Reply-To: <20220609113046.780504-1-elver@google.com>
Message-Id: <20220609113046.780504-9-elver@google.com>
Mime-Version: 1.0
References: <20220609113046.780504-1-elver@google.com>
X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog
Subject: [PATCH 8/8] perf/hw_breakpoint: Clean up headers
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@kernel.org>,
        Dmitry Vyukov <dvyukov@google.com>,
        linux-perf-users@vger.kernel.org, x86@kernel.org,
        linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
        linux-kernel@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Clean up headers:

 - Remove unused <linux/kallsyms.h>

 - Remove unused <linux/kprobes.h>

 - Remove unused <linux/module.h>

 - Remove unused <linux/smp.h>

 - Add <linux/export.h> for EXPORT_SYMBOL_GPL().

 - Sort alphabetically.

 - Move <linux/hw_breakpoint.h> to top to test it compiles on its own.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
---
 kernel/events/hw_breakpoint.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 3b33a4075104..e9aa7f2c031a 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -17,26 +17,24 @@
  * This file contains the arch-independent routines.
  */
=20
+#include <linux/hw_breakpoint.h>
+
 #include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/cpu.h>
+#include <linux/export.h>
+#include <linux/init.h>
 #include <linux/irqflags.h>
-#include <linux/kallsyms.h>
-#include <linux/notifier.h>
-#include <linux/kprobes.h>
 #include <linux/kdebug.h>
 #include <linux/kernel.h>
-#include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/notifier.h>
 #include <linux/percpu.h>
+#include <linux/rhashtable.h>
 #include <linux/sched.h>
-#include <linux/spinlock.h>
-#include <linux/init.h>
 #include <linux/slab.h>
-#include <linux/rhashtable.h>
-#include <linux/cpu.h>
-#include <linux/smp.h>
-#include <linux/bug.h>
+#include <linux/spinlock.h>
=20
-#include <linux/hw_breakpoint.h>
 /*
  * Constraints data
  */
--=20
2.36.1.255.ge46751e96f-goog