perf cpumap: Correct for perf_cpu_map__merge()

[PATCH v1 1/5] libperf cpumap: Correct reference count for perf_cpu_map__merge()

Posted by Leo Yan 2 months ago

The perf_cpu_map__merge() function has two arguments, 'orig' and 'other',
and it returns results for three different cases:

  Case 1: 'other' is a subset of 'orig'.
  Case 2: 'orig' is a subset of 'other'.
  Case 3: 'orig' and 'other' are not subsets of each other.

The result combinations are:

  +--------+-------------+-----------+-----------+
  | Cases  | Result      | orig      | other     |
  +--------+-------------+-----------+-----------+
  | Case1  | orig        | No change | No change |
  +--------+-------------+-----------+-----------+
  | Case2  | other       | No change | refcnt++  |
  +--------+-------------+-----------+-----------+
  | Case3  | New CPU map | refcnt--  | No change |
  +--------+-------------+-----------+-----------+

Both Case 1 and Case 3 have a risk of releasing maps unexpectedly. This
is because the reference counter operations are not consistent crossing
different cases and leads to difficulty for callers handling them.

For Case 1, because 'other' is a subset of 'orig', 'orig' is returned as
the merging result, but its refcnt is not incremented. This can lead to
the map being released repeatedly:

  struct perf_cpu_map *a = perf_cpu_map__new("1,2");
  struct perf_cpu_map *b = perf_cpu_map__new("2");

  /* 'm' and 'a' point to the same CPU map */
  struct perf_cpu_map *m = perf_cpu_map__merge(a, b);

  ...

  perf_cpu_map__put(m); -> Release the map
  perf_cpu_map__put(b);
  perf_cpu_map__put(a); -> Release the same merged again

For Case 3, it is possible that the CPU map pointed to by 'orig' can be
released twice: within the function and outside of it.

  struct perf_cpu_map *a = perf_cpu_map__new("1,2");
  struct perf_cpu_map *b = perf_cpu_map__new("3");

  struct perf_cpu_map *m = perf_cpu_map__merge(a, b);
                             `> 'm' is new allocated map. But 'a' has
			        been released in the function.
  ...

  perf_cpu_map__put(m);
  perf_cpu_map__put(b);
  perf_cpu_map__put(a); -> Release the 'a' map again

This commit increases the reference counter if a passed map is returned.
For the case of a newly allocated map, it does not change the reference
counter for passed maps.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/lib/perf/cpumap.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index cae799ad44e1..3f80eade8b1c 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -438,9 +438,7 @@ bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu
 /*
  * Merge two cpumaps
  *
- * orig either gets freed and replaced with a new map, or reused
- * with no reference count change (similar to "realloc")
- * other has its reference count increased.
+ * Return a new map, or reused with its reference count increased.
  */
 
 struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
@@ -452,11 +450,9 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 	struct perf_cpu_map *merged;
 
 	if (perf_cpu_map__is_subset(orig, other))
-		return orig;
-	if (perf_cpu_map__is_subset(other, orig)) {
-		perf_cpu_map__put(orig);
+		return perf_cpu_map__get(orig);
+	if (perf_cpu_map__is_subset(other, orig))
 		return perf_cpu_map__get(other);
-	}
 
 	tmp_len = __perf_cpu_map__nr(orig) + __perf_cpu_map__nr(other);
 	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
@@ -483,7 +479,6 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 
 	merged = cpu_map__trim_new(k, tmp_cpus);
 	free(tmp_cpus);
-	perf_cpu_map__put(orig);
 	return merged;
 }
 
-- 
2.34.1

Re: [PATCH v1 1/5] libperf cpumap: Correct reference count for perf_cpu_map__merge()

Posted by Adrian Hunter 1 month, 2 weeks ago

On 25/09/24 22:53, Leo Yan wrote:
> The perf_cpu_map__merge() function has two arguments, 'orig' and 'other',
> and it returns results for three different cases:
> 
>   Case 1: 'other' is a subset of 'orig'.
>   Case 2: 'orig' is a subset of 'other'.
>   Case 3: 'orig' and 'other' are not subsets of each other.
> 
> The result combinations are:
> 
>   +--------+-------------+-----------+-----------+
>   | Cases  | Result      | orig      | other     |
>   +--------+-------------+-----------+-----------+
>   | Case1  | orig        | No change | No change |
>   +--------+-------------+-----------+-----------+
>   | Case2  | other       | No change | refcnt++  |
>   +--------+-------------+-----------+-----------+
>   | Case3  | New CPU map | refcnt--  | No change |
>   +--------+-------------+-----------+-----------+
> 
> Both Case 1 and Case 3 have a risk of releasing maps unexpectedly. This
> is because the reference counter operations are not consistent crossing
> different cases and leads to difficulty for callers handling them.
> 
> For Case 1, because 'other' is a subset of 'orig', 'orig' is returned as
> the merging result, but its refcnt is not incremented. This can lead to
> the map being released repeatedly:
> 
>   struct perf_cpu_map *a = perf_cpu_map__new("1,2");
>   struct perf_cpu_map *b = perf_cpu_map__new("2");
> 
>   /* 'm' and 'a' point to the same CPU map */
>   struct perf_cpu_map *m = perf_cpu_map__merge(a, b);
> 
>   ...
> 
>   perf_cpu_map__put(m); -> Release the map
>   perf_cpu_map__put(b);
>   perf_cpu_map__put(a); -> Release the same merged again
> 
> For Case 3, it is possible that the CPU map pointed to by 'orig' can be
> released twice: within the function and outside of it.
> 
>   struct perf_cpu_map *a = perf_cpu_map__new("1,2");
>   struct perf_cpu_map *b = perf_cpu_map__new("3");
> 
>   struct perf_cpu_map *m = perf_cpu_map__merge(a, b);
>                              `> 'm' is new allocated map. But 'a' has
> 			        been released in the function.
>   ...
> 
>   perf_cpu_map__put(m);
>   perf_cpu_map__put(b);
>   perf_cpu_map__put(a); -> Release the 'a' map again
> 
> This commit increases the reference counter if a passed map is returned.
> For the case of a newly allocated map, it does not change the reference
> counter for passed maps.

The 2 non-test uses of perf_cpu_map__merge both do:

	a = perf_cpu_map__merge(a, b);

so another way to make the API less misleading would be
to introduce:

	err = perf_cpu_map__merge_in(&a, b);

where:

int perf_cpu_map__merge_in(struct perf_cpu_map **orig, struct perf_cpu_map *other)
{
	struct perf_cpu_map *result = perf_cpu_map__merge(*orig, other);

	if (!result)
		return -ENOMEM;

	*orig = result;
	return 0;
}

without any changes to perf_cpu_map__merge().

> 
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
>  tools/lib/perf/cpumap.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index cae799ad44e1..3f80eade8b1c 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -438,9 +438,7 @@ bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu
>  /*
>   * Merge two cpumaps
>   *
> - * orig either gets freed and replaced with a new map, or reused
> - * with no reference count change (similar to "realloc")
> - * other has its reference count increased.
> + * Return a new map, or reused with its reference count increased.
>   */
>  
>  struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> @@ -452,11 +450,9 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  	struct perf_cpu_map *merged;
>  
>  	if (perf_cpu_map__is_subset(orig, other))
> -		return orig;
> -	if (perf_cpu_map__is_subset(other, orig)) {
> -		perf_cpu_map__put(orig);
> +		return perf_cpu_map__get(orig);
> +	if (perf_cpu_map__is_subset(other, orig))
>  		return perf_cpu_map__get(other);
> -	}
>  
>  	tmp_len = __perf_cpu_map__nr(orig) + __perf_cpu_map__nr(other);
>  	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> @@ -483,7 +479,6 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  
>  	merged = cpu_map__trim_new(k, tmp_cpus);
>  	free(tmp_cpus);
> -	perf_cpu_map__put(orig);
>  	return merged;
>  }
>

Re: [PATCH v1 1/5] libperf cpumap: Correct reference count for perf_cpu_map__merge()

Posted by Leo Yan 1 month, 2 weeks ago

Hi Adrian,

On 10/10/24 18:41, Adrian Hunter wrote:>
> On 25/09/24 22:53, Leo Yan wrote:
>> The perf_cpu_map__merge() function has two arguments, 'orig' and 'other',
>> and it returns results for three different cases:
>>
>>    Case 1: 'other' is a subset of 'orig'.
>>    Case 2: 'orig' is a subset of 'other'.
>>    Case 3: 'orig' and 'other' are not subsets of each other.
>>
>> The result combinations are:
>>
>>    +--------+-------------+-----------+-----------+
>>    | Cases  | Result      | orig      | other     |
>>    +--------+-------------+-----------+-----------+
>>    | Case1  | orig        | No change | No change |
>>    +--------+-------------+-----------+-----------+
>>    | Case2  | other       | No change | refcnt++  |
>>    +--------+-------------+-----------+-----------+
>>    | Case3  | New CPU map | refcnt--  | No change |
>>    +--------+-------------+-----------+-----------+
>>
>> Both Case 1 and Case 3 have a risk of releasing maps unexpectedly. This
>> is because the reference counter operations are not consistent crossing
>> different cases and leads to difficulty for callers handling them.
>>
>> For Case 1, because 'other' is a subset of 'orig', 'orig' is returned as
>> the merging result, but its refcnt is not incremented. This can lead to
>> the map being released repeatedly:
>>
>>    struct perf_cpu_map *a = perf_cpu_map__new("1,2");
>>    struct perf_cpu_map *b = perf_cpu_map__new("2");
>>
>>    /* 'm' and 'a' point to the same CPU map */
>>    struct perf_cpu_map *m = perf_cpu_map__merge(a, b);
>>
>>    ...
>>
>>    perf_cpu_map__put(m); -> Release the map
>>    perf_cpu_map__put(b);
>>    perf_cpu_map__put(a); -> Release the same merged again
>>
>> For Case 3, it is possible that the CPU map pointed to by 'orig' can be
>> released twice: within the function and outside of it.
>>
>>    struct perf_cpu_map *a = perf_cpu_map__new("1,2");
>>    struct perf_cpu_map *b = perf_cpu_map__new("3");
>>
>>    struct perf_cpu_map *m = perf_cpu_map__merge(a, b);
>>                               `> 'm' is new allocated map. But 'a' has
>>                                been released in the function.
>>    ...
>>
>>    perf_cpu_map__put(m);
>>    perf_cpu_map__put(b);
>>    perf_cpu_map__put(a); -> Release the 'a' map again
>>
>> This commit increases the reference counter if a passed map is returned.
>> For the case of a newly allocated map, it does not change the reference
>> counter for passed maps.
> 
> The 2 non-test uses of perf_cpu_map__merge both do:
> 
>          a = perf_cpu_map__merge(a, b);
> 
> so another way to make the API less misleading would be
> to introduce:
> 
>          err = perf_cpu_map__merge_in(&a, b);
> 
> where:
> 
> int perf_cpu_map__merge_in(struct perf_cpu_map **orig, struct perf_cpu_map *other)
> {
>          struct perf_cpu_map *result = perf_cpu_map__merge(*orig, other);
> 
>          if (!result)
>                  return -ENOMEM;
> 
>          *orig = result;
>          return 0;
> }
> 
> without any changes to perf_cpu_map__merge().

Just wandering why we cannot do the same thing for the perf_cpu_map__merge() 
function?

   int perf_cpu_map__merge_in(struct perf_cpu_map **orig,
                              struct perf_cpu_map *other)

This can allow us to avoid any confusion in the first place. And we don't need 
to maintain two functions for the same thing.

Thanks,
Leo

Re: [PATCH v1 1/5] libperf cpumap: Correct reference count for perf_cpu_map__merge()

Posted by Leo Yan 1 month, 2 weeks ago


On 10/11/24 10:34, Leo Yan wrote:

>> The 2 non-test uses of perf_cpu_map__merge both do:
>>
>>          a = perf_cpu_map__merge(a, b);
>>
>> so another way to make the API less misleading would be
>> to introduce:
>>
>>          err = perf_cpu_map__merge_in(&a, b);
>>
>> where:
>>
>> int perf_cpu_map__merge_in(struct perf_cpu_map **orig, struct perf_cpu_map 
>> *other)
>> {
>>          struct perf_cpu_map *result = perf_cpu_map__merge(*orig, other);
>>
>>          if (!result)
>>                  return -ENOMEM;
>>
>>          *orig = result;
>>          return 0;
>> }
>>
>> without any changes to perf_cpu_map__merge().
> 
> Just wandering why we cannot do the same thing for the perf_cpu_map__merge()
> function?
> 
>    int perf_cpu_map__merge_in(struct perf_cpu_map **orig,
>                               struct perf_cpu_map *other)

Sorry for typo and spamming. The above suggested definition is for 
perf_cpu_map__merge().


> This can allow us to avoid any confusion in the first place. And we don't need
> to maintain two functions for the same thing.
> 
> Thanks,
> Leo
> 
>

Re: [PATCH v1 1/5] libperf cpumap: Correct reference count for perf_cpu_map__merge()

Posted by Adrian Hunter 1 month, 2 weeks ago

On 11/10/24 12:40, Leo Yan wrote:
> 
> 
> On 10/11/24 10:34, Leo Yan wrote:
> 
>>> The 2 non-test uses of perf_cpu_map__merge both do:
>>>
>>>          a = perf_cpu_map__merge(a, b);
>>>
>>> so another way to make the API less misleading would be
>>> to introduce:
>>>
>>>          err = perf_cpu_map__merge_in(&a, b);
>>>
>>> where:
>>>
>>> int perf_cpu_map__merge_in(struct perf_cpu_map **orig, struct perf_cpu_map *other)
>>> {
>>>          struct perf_cpu_map *result = perf_cpu_map__merge(*orig, other);
>>>
>>>          if (!result)
>>>                  return -ENOMEM;
>>>
>>>          *orig = result;
>>>          return 0;
>>> }
>>>
>>> without any changes to perf_cpu_map__merge().
>>
>> Just wandering why we cannot do the same thing for the perf_cpu_map__merge()
>> function?
>>
>>    int perf_cpu_map__merge_in(struct perf_cpu_map **orig,
>>                               struct perf_cpu_map *other)
> 
> Sorry for typo and spamming. The above suggested definition is for perf_cpu_map__merge().

Yes - there is not much reason to have perf_cpu_map__merge()
and perf_cpu_map__merge_in().

> 
> 
>> This can allow us to avoid any confusion in the first place. And we don't need
>> to maintain two functions for the same thing.
>>
>> Thanks,
>> Leo
>>
>>

Re: [PATCH v1 1/5] libperf cpumap: Correct reference count for perf_cpu_map__merge()

Posted by Leo Yan 1 month, 2 weeks ago

On 10/11/24 10:46, Adrian Hunter wrote:

[...]

>>>> int perf_cpu_map__merge_in(struct perf_cpu_map **orig, struct perf_cpu_map *other)
>>>> {
>>>>           struct perf_cpu_map *result = perf_cpu_map__merge(*orig, other);
>>>>
>>>>           if (!result)
>>>>                   return -ENOMEM;
>>>>
>>>>           *orig = result;
>>>>           return 0;
>>>> }
>>>>
>>>> without any changes to perf_cpu_map__merge().
>>>
>>> Just wandering why we cannot do the same thing for the perf_cpu_map__merge()
>>> function?
>>>
>>>     int perf_cpu_map__merge_in(struct perf_cpu_map **orig,
>>>                                struct perf_cpu_map *other)
>>
>> Sorry for typo and spamming. The above suggested definition is for perf_cpu_map__merge().
> 
> Yes - there is not much reason to have perf_cpu_map__merge()
> and perf_cpu_map__merge_in().

Thanks for suggestion!  Will move towards this.

Leo