[PATCH 02/19] perf/hisilicon: Fix group validation

Robin Murphy posted 19 patches 1 month, 3 weeks ago
[PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Robin Murphy 1 month, 3 weeks ago
The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
a bit off, in that given a software group leader, it will consider that
event *in place of* the actual new event being opened. At worst this
could theoretically allow an unschedulable group if the software event
config happens to look like one of the hardware siblings.

The uncore framework avoids that particular issue, but all 3 also share
the common issue of not preventing racy access to the sibling list, and
some redundant checks which can be cleaned up.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/perf/hisilicon/hisi_pcie_pmu.c   | 17 ++++++-----------
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
 drivers/perf/hisilicon/hns3_pmu.c        | 17 ++++++-----------
 3 files changed, 19 insertions(+), 38 deletions(-)

diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
index c5394d007b61..3b0b2f7197d0 100644
--- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
+++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
@@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
 	int counters = 1;
 	int num;
 
-	event_group[0] = leader;
-	if (!is_software_event(leader)) {
-		if (leader->pmu != event->pmu)
-			return false;
+	if (leader == event)
+		return true;
 
-		if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
-			event_group[counters++] = event;
-	}
+	event_group[0] = event;
+	if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
+		event_group[counters++] = leader;
 
 	for_each_sibling_event(sibling, event->group_leader) {
-		if (is_software_event(sibling))
-			continue;
-
 		if (sibling->pmu != event->pmu)
-			return false;
+			continue;
 
 		for (num = 0; num < counters; num++) {
 			/*
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index a449651f79c9..3c531b36cf25 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
 	/* Include count for the event */
 	int counters = 1;
 
-	if (!is_software_event(leader)) {
-		/*
-		 * We must NOT create groups containing mixed PMUs, although
-		 * software events are acceptable
-		 */
-		if (leader->pmu != event->pmu)
-			return false;
+	if (leader == event)
+		return true;
 
-		/* Increment counter for the leader */
-		if (leader != event)
-			counters++;
-	}
+	/* Increment counter for the leader */
+	if (leader->pmu == event->pmu)
+		counters++;
 
 	for_each_sibling_event(sibling, event->group_leader) {
-		if (is_software_event(sibling))
-			continue;
-		if (sibling->pmu != event->pmu)
-			return false;
 		/* Increment counter for each sibling */
-		counters++;
+		if (sibling->pmu == event->pmu)
+			counters++;
 	}
 
 	/* The group can not count events more than the counters in the HW */
diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
index c157f3572cae..382e469257f9 100644
--- a/drivers/perf/hisilicon/hns3_pmu.c
+++ b/drivers/perf/hisilicon/hns3_pmu.c
@@ -1058,21 +1058,16 @@ static bool hns3_pmu_validate_event_group(struct perf_event *event)
 	int counters = 1;
 	int num;
 
-	event_group[0] = leader;
-	if (!is_software_event(leader)) {
-		if (leader->pmu != event->pmu)
-			return false;
+	if (leader == event)
+		return true;
 
-		if (leader != event && !hns3_pmu_cmp_event(leader, event))
-			event_group[counters++] = event;
-	}
+	event_group[0] = event;
+	if (leader->pmu == event->pmu && !hns3_pmu_cmp_event(leader, event))
+		event_group[counters++] = leader;
 
 	for_each_sibling_event(sibling, event->group_leader) {
-		if (is_software_event(sibling))
-			continue;
-
 		if (sibling->pmu != event->pmu)
-			return false;
+			continue;
 
 		for (num = 0; num < counters; num++) {
 			/*
-- 
2.39.2.101.g768bb238c484.dirty
Re: [PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Mark Rutland 1 month, 1 week ago
On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
> a bit off, in that given a software group leader, it will consider that
> event *in place of* the actual new event being opened. At worst this
> could theoretically allow an unschedulable group if the software event
> config happens to look like one of the hardware siblings.
> 
> The uncore framework avoids that particular issue,

What is "the uncore framework"? I'm not sure exactly what you're
referring to, nor how that composes with the problem described above.

> but all 3 also share the common issue of not preventing racy access to
> the sibling list,

Can you please elaborate on this racy access to the silbing list? I'm
not sure exactly what you're referring to.

> and some redundant checks which can be cleaned up.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/perf/hisilicon/hisi_pcie_pmu.c   | 17 ++++++-----------
>  drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
>  drivers/perf/hisilicon/hns3_pmu.c        | 17 ++++++-----------
>  3 files changed, 19 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> index c5394d007b61..3b0b2f7197d0 100644
> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
>  	int counters = 1;
>  	int num;
>  
> -	event_group[0] = leader;
> -	if (!is_software_event(leader)) {
> -		if (leader->pmu != event->pmu)
> -			return false;
> +	if (leader == event)
> +		return true;
>  
> -		if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> -			event_group[counters++] = event;
> -	}
> +	event_group[0] = event;
> +	if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> +		event_group[counters++] = leader;

Looking at this, the existing logic to share counters (which
hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
that the start/stop callbacks will reprogram the HW counters (and hence
can fight with one another).

I suspect that can be removed *entirely*, and this can be simplified
down to allocating N counters, without a quadratic event comparison.  We
don't try to share counters in other PMU drivers, and there was no
rationale for trying to do this when this wa introduced in commit:

  8404b0fbc7fbd42e ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")

The 'link' tag in that comment goes to v13, which doesn't link to prior
postings, so I'm not going to dig further.

Mark.

>  
>  	for_each_sibling_event(sibling, event->group_leader) {
> -		if (is_software_event(sibling))
> -			continue;
> -
>  		if (sibling->pmu != event->pmu)
> -			return false;
> +			continue;
>  
>  		for (num = 0; num < counters; num++) {
>  			/*
> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> index a449651f79c9..3c531b36cf25 100644
> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> @@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
>  	/* Include count for the event */
>  	int counters = 1;
>  
> -	if (!is_software_event(leader)) {
> -		/*
> -		 * We must NOT create groups containing mixed PMUs, although
> -		 * software events are acceptable
> -		 */
> -		if (leader->pmu != event->pmu)
> -			return false;
> +	if (leader == event)
> +		return true;
>  
> -		/* Increment counter for the leader */
> -		if (leader != event)
> -			counters++;
> -	}
> +	/* Increment counter for the leader */
> +	if (leader->pmu == event->pmu)
> +		counters++;
>  
>  	for_each_sibling_event(sibling, event->group_leader) {
> -		if (is_software_event(sibling))
> -			continue;
> -		if (sibling->pmu != event->pmu)
> -			return false;
>  		/* Increment counter for each sibling */
> -		counters++;
> +		if (sibling->pmu == event->pmu)
> +			counters++;
>  	}
>  
>  	/* The group can not count events more than the counters in the HW */
> diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
> index c157f3572cae..382e469257f9 100644
> --- a/drivers/perf/hisilicon/hns3_pmu.c
> +++ b/drivers/perf/hisilicon/hns3_pmu.c
> @@ -1058,21 +1058,16 @@ static bool hns3_pmu_validate_event_group(struct perf_event *event)
>  	int counters = 1;
>  	int num;
>  
> -	event_group[0] = leader;
> -	if (!is_software_event(leader)) {
> -		if (leader->pmu != event->pmu)
> -			return false;
> +	if (leader == event)
> +		return true;
>  
> -		if (leader != event && !hns3_pmu_cmp_event(leader, event))
> -			event_group[counters++] = event;
> -	}
> +	event_group[0] = event;
> +	if (leader->pmu == event->pmu && !hns3_pmu_cmp_event(leader, event))
> +		event_group[counters++] = leader;
>  
>  	for_each_sibling_event(sibling, event->group_leader) {
> -		if (is_software_event(sibling))
> -			continue;
> -
>  		if (sibling->pmu != event->pmu)
> -			return false;
> +			continue;
>  
>  		for (num = 0; num < counters; num++) {
>  			/*
> -- 
> 2.39.2.101.g768bb238c484.dirty
>
Re: [PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Robin Murphy 1 month, 1 week ago
On 2025-08-26 12:15 pm, Mark Rutland wrote:
> On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
>> The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
>> a bit off, in that given a software group leader, it will consider that
>> event *in place of* the actual new event being opened. At worst this
>> could theoretically allow an unschedulable group if the software event
>> config happens to look like one of the hardware siblings.
>>
>> The uncore framework avoids that particular issue,
> 
> What is "the uncore framework"? I'm not sure exactly what you're
> referring to, nor how that composes with the problem described above.

Literally that hisi_uncore_pmu.c is actually a framework for half a 
dozen individual sub-drivers rather than a "driver" itself per se, but I 
suppose that detail doesn't strictly matter at this level.

>> but all 3 also share the common issue of not preventing racy access to
>> the sibling list,
> 
> Can you please elaborate on this racy access to the silbing list? I'm
> not sure exactly what you're referring to.

Hmm, yes, I guess an actual race is probably impossible since if we're 
still in the middle of opening the group leader event then we haven't 
yet allocated the fd that userspace would need to start adding siblings, 
even if it tried to guess. I leaned on "racy" as a concise way to infer 
"when it isn't locked (even though the reasons for that are more 
subtle)" repeatedly over several patches - after all, the overall theme 
of this series is that I dislike repetitive boilerplate :)

I'll dedicate some time for polishing commit messages for v2, especially 
the common context for these "part 1" patches per your feedback on patch #1.

>> and some redundant checks which can be cleaned up.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/perf/hisilicon/hisi_pcie_pmu.c   | 17 ++++++-----------
>>   drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
>>   drivers/perf/hisilicon/hns3_pmu.c        | 17 ++++++-----------
>>   3 files changed, 19 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
>> index c5394d007b61..3b0b2f7197d0 100644
>> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
>> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
>> @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
>>   	int counters = 1;
>>   	int num;
>>   
>> -	event_group[0] = leader;
>> -	if (!is_software_event(leader)) {
>> -		if (leader->pmu != event->pmu)
>> -			return false;
>> +	if (leader == event)
>> +		return true;
>>   
>> -		if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
>> -			event_group[counters++] = event;
>> -	}
>> +	event_group[0] = event;
>> +	if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
>> +		event_group[counters++] = leader;
> 
> Looking at this, the existing logic to share counters (which
> hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> that the start/stop callbacks will reprogram the HW counters (and hence
> can fight with one another).

Yeah, this had a dodgy smell when I first came across it, but after 
doing all the digging I think it does actually work out - the trick 
seems to be the group_leader check in hisi_pcie_pmu_get_event_idx(), 
with the implication the PMU is going to be stopped while scheduling 
in/out the whole group, so assuming hisi_pcie_pmu_del() doesn't clear 
the counter value in hardware (even though the first call nukes the rest 
of the event configuration), then the events should stay in sync.

It does seem somewhat nonsensical to have multiple copies of the same 
event in the same group, but I imagine it could happen with some sort of 
scripted combination of metrics, and supporting it at this level saves 
needing explicit deduplication further up. So even though my initial 
instinct was to rip it out too, in the end I concluded that that doesn't 
seem justified.

Thanks,
Robin.

> I suspect that can be removed *entirely*, and this can be simplified
> down to allocating N counters, without a quadratic event comparison.  We
> don't try to share counters in other PMU drivers, and there was no
> rationale for trying to do this when this wa introduced in commit:
> 
>    8404b0fbc7fbd42e ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
> 
> The 'link' tag in that comment goes to v13, which doesn't link to prior
> postings, so I'm not going to dig further.
> 
> Mark.
> 
>>   
>>   	for_each_sibling_event(sibling, event->group_leader) {
>> -		if (is_software_event(sibling))
>> -			continue;
>> -
>>   		if (sibling->pmu != event->pmu)
>> -			return false;
>> +			continue;
>>   
>>   		for (num = 0; num < counters; num++) {
>>   			/*
>> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> index a449651f79c9..3c531b36cf25 100644
>> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> @@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
>>   	/* Include count for the event */
>>   	int counters = 1;
>>   
>> -	if (!is_software_event(leader)) {
>> -		/*
>> -		 * We must NOT create groups containing mixed PMUs, although
>> -		 * software events are acceptable
>> -		 */
>> -		if (leader->pmu != event->pmu)
>> -			return false;
>> +	if (leader == event)
>> +		return true;
>>   
>> -		/* Increment counter for the leader */
>> -		if (leader != event)
>> -			counters++;
>> -	}
>> +	/* Increment counter for the leader */
>> +	if (leader->pmu == event->pmu)
>> +		counters++;
>>   
>>   	for_each_sibling_event(sibling, event->group_leader) {
>> -		if (is_software_event(sibling))
>> -			continue;
>> -		if (sibling->pmu != event->pmu)
>> -			return false;
>>   		/* Increment counter for each sibling */
>> -		counters++;
>> +		if (sibling->pmu == event->pmu)
>> +			counters++;
>>   	}
>>   
>>   	/* The group can not count events more than the counters in the HW */
>> diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
>> index c157f3572cae..382e469257f9 100644
>> --- a/drivers/perf/hisilicon/hns3_pmu.c
>> +++ b/drivers/perf/hisilicon/hns3_pmu.c
>> @@ -1058,21 +1058,16 @@ static bool hns3_pmu_validate_event_group(struct perf_event *event)
>>   	int counters = 1;
>>   	int num;
>>   
>> -	event_group[0] = leader;
>> -	if (!is_software_event(leader)) {
>> -		if (leader->pmu != event->pmu)
>> -			return false;
>> +	if (leader == event)
>> +		return true;
>>   
>> -		if (leader != event && !hns3_pmu_cmp_event(leader, event))
>> -			event_group[counters++] = event;
>> -	}
>> +	event_group[0] = event;
>> +	if (leader->pmu == event->pmu && !hns3_pmu_cmp_event(leader, event))
>> +		event_group[counters++] = leader;
>>   
>>   	for_each_sibling_event(sibling, event->group_leader) {
>> -		if (is_software_event(sibling))
>> -			continue;
>> -
>>   		if (sibling->pmu != event->pmu)
>> -			return false;
>> +			continue;
>>   
>>   		for (num = 0; num < counters; num++) {
>>   			/*
>> -- 
>> 2.39.2.101.g768bb238c484.dirty
>>
Re: [PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Mark Rutland 1 month, 1 week ago
On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > > The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
> > > a bit off, in that given a software group leader, it will consider that
> > > event *in place of* the actual new event being opened. At worst this
> > > could theoretically allow an unschedulable group if the software event
> > > config happens to look like one of the hardware siblings.
> > > 
> > > The uncore framework avoids that particular issue,
> > 
> > What is "the uncore framework"? I'm not sure exactly what you're
> > referring to, nor how that composes with the problem described above.
> 
> Literally that hisi_uncore_pmu.c is actually a framework for half a dozen
> individual sub-drivers rather than a "driver" itself per se, but I suppose
> that detail doesn't strictly matter at this level.

I see. My concern was just that I couldn't figure out what "the uncore
framework" was, since it sounded more generic. If you say something like
"the shared code in hisi_uncore_pmu.c", I think that would be clearer.

> > > but all 3 also share the common issue of not preventing racy access to
> > > the sibling list,
> > 
> > Can you please elaborate on this racy access to the silbing list? I'm
> > not sure exactly what you're referring to.
> 
> Hmm, yes, I guess an actual race is probably impossible since if we're still
> in the middle of opening the group leader event then we haven't yet
> allocated the fd that userspace would need to start adding siblings, even if
> it tried to guess. I leaned on "racy" as a concise way to infer "when it
> isn't locked (even though the reasons for that are more subtle)" repeatedly
> over several patches - after all, the overall theme of this series is that I
> dislike repetitive boilerplate :)
> 
> I'll dedicate some time for polishing commit messages for v2, especially the
> common context for these "part 1" patches per your feedback on patch #1.

Thanks!

> > > and some redundant checks which can be cleaned up.
> > > 
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >   drivers/perf/hisilicon/hisi_pcie_pmu.c   | 17 ++++++-----------
> > >   drivers/perf/hisilicon/hisi_uncore_pmu.c | 23 +++++++----------------
> > >   drivers/perf/hisilicon/hns3_pmu.c        | 17 ++++++-----------
> > >   3 files changed, 19 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > index c5394d007b61..3b0b2f7197d0 100644
> > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > >   	int counters = 1;
> > >   	int num;
> > > -	event_group[0] = leader;
> > > -	if (!is_software_event(leader)) {
> > > -		if (leader->pmu != event->pmu)
> > > -			return false;
> > > +	if (leader == event)
> > > +		return true;
> > > -		if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> > > -			event_group[counters++] = event;
> > > -	}
> > > +	event_group[0] = event;
> > > +	if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> > > +		event_group[counters++] = leader;
> > 
> > Looking at this, the existing logic to share counters (which
> > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > that the start/stop callbacks will reprogram the HW counters (and hence
> > can fight with one another).
> 
> Yeah, this had a dodgy smell when I first came across it, but after doing
> all the digging I think it does actually work out - the trick seems to be
> the group_leader check in hisi_pcie_pmu_get_event_idx(), with the
> implication the PMU is going to be stopped while scheduling in/out the whole
> group, so assuming hisi_pcie_pmu_del() doesn't clear the counter value in
> hardware (even though the first call nukes the rest of the event
> configuration), then the events should stay in sync.

I don't think that's sufficient. If nothing else, overflow is handled
per-event, and for a group of two identical events, upon overflow
hisi_pcie_pmu_irq() will reprogram the shared HW counter when handling
the first event, and the second event will see an arbitrary
discontinuity. Maybe no-one has spotted that due to the 2^63 counter
period that we program, but this is clearly bogus.

In addition, AFAICT the IRQ handler doesn't stop the PMU, so in general
groups aren't handled atomically, and snapshots of the counters won't be
atomic.

> It does seem somewhat nonsensical to have multiple copies of the same event
> in the same group, but I imagine it could happen with some sort of scripted
> combination of metrics, and supporting it at this level saves needing
> explicit deduplication further up. So even though my initial instinct was to
> rip it out too, in the end I concluded that that doesn't seem justified.

As above, I think it's clearly bogus. I don't think we should have
merged it as-is and it's not something I'd like to see others copy.
Other PMUs don't do this sort of event deduplication, and in general it
should be up to the user or userspace software to do that rather than
doing that badly in the kernel.

Given it was implemented with no rationale I think we should rip it out.
If that breaks someone's scripting, then we can consider implementing
something that actually works.

Mark.
Re: [PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Mark Rutland 1 month, 1 week ago
On Tue, Aug 26, 2025 at 04:31:23PM +0100, Mark Rutland wrote:
> On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> > On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:

> > > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > index c5394d007b61..3b0b2f7197d0 100644
> > > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > > >   	int counters = 1;
> > > >   	int num;
> > > > -	event_group[0] = leader;
> > > > -	if (!is_software_event(leader)) {
> > > > -		if (leader->pmu != event->pmu)
> > > > -			return false;
> > > > +	if (leader == event)
> > > > +		return true;
> > > > -		if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > -			event_group[counters++] = event;
> > > > -	}
> > > > +	event_group[0] = event;
> > > > +	if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > +		event_group[counters++] = leader;
> > > 
> > > Looking at this, the existing logic to share counters (which
> > > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > > that the start/stop callbacks will reprogram the HW counters (and hence
> > > can fight with one another).

> > It does seem somewhat nonsensical to have multiple copies of the same event
> > in the same group, but I imagine it could happen with some sort of scripted
> > combination of metrics, and supporting it at this level saves needing
> > explicit deduplication further up. So even though my initial instinct was to
> > rip it out too, in the end I concluded that that doesn't seem justified.
> 
> As above, I think it's clearly bogus. I don't think we should have
> merged it as-is and it's not something I'd like to see others copy.
> Other PMUs don't do this sort of event deduplication, and in general it
> should be up to the user or userspace software to do that rather than
> doing that badly in the kernel.
> 
> Given it was implemented with no rationale I think we should rip it out.
> If that breaks someone's scripting, then we can consider implementing
> something that actually works.

Having dug some more, I see that this was intended to handle the way
the hardware shares a single config register between pairs of counter
and counter_ext registers, with the idea being that two related events
could be allocated into the same counter pair (but would only occupy a
single counter each).

I still think the code is wrong, but it is more complex than I made it
out to be, and you're right that we should leave it as-is for now. I can
follow up after we've got this series in.

Mark.
Re: [PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Mark Rutland 1 month, 1 week ago
On Tue, Aug 26, 2025 at 04:31:23PM +0100, Mark Rutland wrote:
> On Tue, Aug 26, 2025 at 03:35:48PM +0100, Robin Murphy wrote:
> > On 2025-08-26 12:15 pm, Mark Rutland wrote:
> > > On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > > > diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > index c5394d007b61..3b0b2f7197d0 100644
> > > > --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> > > > @@ -338,21 +338,16 @@ static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
> > > >   	int counters = 1;
> > > >   	int num;
> > > > -	event_group[0] = leader;
> > > > -	if (!is_software_event(leader)) {
> > > > -		if (leader->pmu != event->pmu)
> > > > -			return false;
> > > > +	if (leader == event)
> > > > +		return true;
> > > > -		if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > -			event_group[counters++] = event;
> > > > -	}
> > > > +	event_group[0] = event;
> > > > +	if (leader->pmu == event->pmu && !hisi_pcie_pmu_cmp_event(leader, event))
> > > > +		event_group[counters++] = leader;
> > > 
> > > Looking at this, the existing logic to share counters (which
> > > hisi_pcie_pmu_cmp_event() is trying to permit) looks to be bogus, given
> > > that the start/stop callbacks will reprogram the HW counters (and hence
> > > can fight with one another).
> > 
> > Yeah, this had a dodgy smell when I first came across it, but after doing
> > all the digging I think it does actually work out - the trick seems to be
> > the group_leader check in hisi_pcie_pmu_get_event_idx(), with the
> > implication the PMU is going to be stopped while scheduling in/out the whole
> > group, so assuming hisi_pcie_pmu_del() doesn't clear the counter value in
> > hardware (even though the first call nukes the rest of the event
> > configuration), then the events should stay in sync.
> 
> I don't think that's sufficient. If nothing else, overflow is handled
> per-event, and for a group of two identical events, upon overflow
> hisi_pcie_pmu_irq() will reprogram the shared HW counter when handling
> the first event, and the second event will see an arbitrary
> discontinuity. Maybe no-one has spotted that due to the 2^63 counter
> period that we program, but this is clearly bogus.
> 
> In addition, AFAICT the IRQ handler doesn't stop the PMU, so in general
> groups aren't handled atomically, and snapshots of the counters won't be
> atomic.
> 
> > It does seem somewhat nonsensical to have multiple copies of the same event
> > in the same group, but I imagine it could happen with some sort of scripted
> > combination of metrics, and supporting it at this level saves needing
> > explicit deduplication further up. So even though my initial instinct was to
> > rip it out too, in the end I concluded that that doesn't seem justified.
> 

[...]

> As above, I think it's clearly bogus. I don't think we should have
> merged it as-is and it's not something I'd like to see others copy.
> Other PMUs don't do this sort of event deduplication, and in general it
> should be up to the user or userspace software to do that rather than
> doing that badly in the kernel.
> 
> Given it was implemented with no rationale I think we should rip it out.
> If that breaks someone's scripting, then we can consider implementing
> something that actually works.

FWIW, I'm happy to go do that as a follow-up, so if that's a pain, feel
free to leave that as-is for now.

Mark.
Re: [PATCH 02/19] perf/hisilicon: Fix group validation
Posted by Mark Rutland 1 month, 1 week ago
On Tue, Aug 26, 2025 at 12:15:23PM +0100, Mark Rutland wrote:
> On Wed, Aug 13, 2025 at 06:00:54PM +0100, Robin Murphy wrote:
> > The group validation logic shared by the HiSilicon HNS3/PCIe drivers is
> > a bit off, in that given a software group leader, it will consider that
> > event *in place of* the actual new event being opened. At worst this
> > could theoretically allow an unschedulable group if the software event
> > config happens to look like one of the hardware siblings.
> > 
> > The uncore framework avoids that particular issue,
> 
> What is "the uncore framework"? I'm not sure exactly what you're
> referring to, nor how that composes with the problem described above.
> 
> > but all 3 also share the common issue of not preventing racy access to
> > the sibling list,
> 
> Can you please elaborate on this racy access to the silbing list? I'm
> not sure exactly what you're referring to.

Ah, I think you're referring to the issue in:

  https://lore.kernel.org/linux-arm-kernel/Zg0l642PgQ7T3a8Z@FVFF77S0Q05N/

... where when creatign a new event which is its own group leader,
lockdep_assert_event_ctx(event) fires in for_each_sibling_event(),
because the new event's context isn't locked...

> > diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > index a449651f79c9..3c531b36cf25 100644
> > --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > @@ -101,26 +101,17 @@ static bool hisi_validate_event_group(struct perf_event *event)
> >  	/* Include count for the event */
> >  	int counters = 1;
> >  
> > -	if (!is_software_event(leader)) {
> > -		/*
> > -		 * We must NOT create groups containing mixed PMUs, although
> > -		 * software events are acceptable
> > -		 */
> > -		if (leader->pmu != event->pmu)
> > -			return false;
> > +	if (leader == event)
> > +		return true;

... and hence bailing out here avoids that?

It's not strictly "racy access to the sibling list", becuase there's
nothing else accessing the list; it's just that this is the simplest way
to appease lockdep while avoiding false negatives.

It'd probably be better to say something like "the common issue of
calling for_each_sibling_event() when initialising a new group leader",
and maybe to spell that out a bit.

Mark.