[PATCH drivers/perf: hisi:] drivers/perf: hisi: fix NULL pointer issue when uninstall hns3 pmu driver

Jijie Shao posted 1 patch 2 years, 2 months ago
drivers/perf/hisilicon/hns3_pmu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
[PATCH drivers/perf: hisi:] drivers/perf: hisi: fix NULL pointer issue when uninstall hns3 pmu driver
Posted by Jijie Shao 2 years, 2 months ago
From: Hao Chen <chenhao418@huawei.com>

When uninstall hns3 pmu driver, it will call cpuhp_state_remove_instance()
and then callback function hns3_pmu_offline_cpu() is called, it may cause
NULL pointer call trace when other driver is installing or uninstalling
concurrently.

As John Garry's opinion, cpuhp_state_remove_instance() is used for shared
interrupt, and using cpuhp_state_remove_instance_nocalls() is fine for PCIe
or HNS3 pmu.

So, replace cpuhp_state_remove_instance() with
cpuhp_state_remove_instance_nocalls() to fix this problem.

Fixes: 66637ab137b4 ("drivers/perf: hisi: add driver for HNS3 PMU")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 drivers/perf/hisilicon/hns3_pmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
index e0457d84af6b..16869bf5bf4c 100644
--- a/drivers/perf/hisilicon/hns3_pmu.c
+++ b/drivers/perf/hisilicon/hns3_pmu.c
@@ -1556,8 +1556,8 @@ static int hns3_pmu_init_pmu(struct pci_dev *pdev, struct hns3_pmu *hns3_pmu)
 	ret = perf_pmu_register(&hns3_pmu->pmu, hns3_pmu->pmu.name, -1);
 	if (ret) {
 		pci_err(pdev, "failed to register perf PMU, ret = %d.\n", ret);
-		cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
-					    &hns3_pmu->node);
+		cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
+						    &hns3_pmu->node);
 	}
 
 	return ret;
@@ -1568,8 +1568,8 @@ static void hns3_pmu_uninit_pmu(struct pci_dev *pdev)
 	struct hns3_pmu *hns3_pmu = pci_get_drvdata(pdev);
 
 	perf_pmu_unregister(&hns3_pmu->pmu);
-	cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
-				    &hns3_pmu->node);
+	cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
+					    &hns3_pmu->node);
 }
 
 static int hns3_pmu_init_dev(struct pci_dev *pdev)
-- 
2.30.0
Re: [PATCH drivers/perf: hisi:] drivers/perf: hisi: fix NULL pointer issue when uninstall hns3 pmu driver
Posted by Yicong Yang 2 years, 2 months ago
Hi Jijie,

On 2023/10/9 18:50, Jijie Shao wrote:
> From: Hao Chen <chenhao418@huawei.com>
> 
> When uninstall hns3 pmu driver, it will call cpuhp_state_remove_instance()
> and then callback function hns3_pmu_offline_cpu() is called, it may cause
> NULL pointer call trace when other driver is installing or uninstalling
> concurrently.
> 

More information about the calltrace you've met and how to reproduce this?
I'm not sure why other drivers are involved.

> As John Garry's opinion, cpuhp_state_remove_instance() is used for shared
> interrupt, and using cpuhp_state_remove_instance_nocalls() is fine for PCIe
> or HNS3 pmu.
> 

I'm a bit confused here. We need to update the using CPU and migrate the perf
context as well as the interrupt affinity in cpuhp::teardown() callback, so
it make sense to not call this on driver detachment. But I cannot figure
out why this is related to the shared interrupt, more details?

> So, replace cpuhp_state_remove_instance() with
> cpuhp_state_remove_instance_nocalls() to fix this problem.
> 
> Fixes: 66637ab137b4 ("drivers/perf: hisi: add driver for HNS3 PMU")
> Signed-off-by: Hao Chen <chenhao418@huawei.com>
> Signed-off-by: Jijie Shao <shaojijie@huawei.com>
> ---
>  drivers/perf/hisilicon/hns3_pmu.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
> index e0457d84af6b..16869bf5bf4c 100644
> --- a/drivers/perf/hisilicon/hns3_pmu.c
> +++ b/drivers/perf/hisilicon/hns3_pmu.c
> @@ -1556,8 +1556,8 @@ static int hns3_pmu_init_pmu(struct pci_dev *pdev, struct hns3_pmu *hns3_pmu)
>  	ret = perf_pmu_register(&hns3_pmu->pmu, hns3_pmu->pmu.name, -1);
>  	if (ret) {
>  		pci_err(pdev, "failed to register perf PMU, ret = %d.\n", ret);
> -		cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
> -					    &hns3_pmu->node);
> +		cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
> +						    &hns3_pmu->node);
>  	}
>  
>  	return ret;
> @@ -1568,8 +1568,8 @@ static void hns3_pmu_uninit_pmu(struct pci_dev *pdev)
>  	struct hns3_pmu *hns3_pmu = pci_get_drvdata(pdev);
>  
>  	perf_pmu_unregister(&hns3_pmu->pmu);
> -	cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
> -				    &hns3_pmu->node);
> +	cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
> +					    &hns3_pmu->node);
>  }
Thanks.
Re: [PATCH drivers/perf: hisi:] drivers/perf: hisi: fix NULL pointer issue when uninstall hns3 pmu driver
Posted by Jijie Shao 2 years, 2 months ago
on 2023/10/10 17:32, Yicong Yang wrote:
> Hi Jijie,
>
> On 2023/10/9 18:50, Jijie Shao wrote:
>> From: Hao Chen <chenhao418@huawei.com>
>>
>> When uninstall hns3 pmu driver, it will call cpuhp_state_remove_instance()
>> and then callback function hns3_pmu_offline_cpu() is called, it may cause
>> NULL pointer call trace when other driver is installing or uninstalling
>> concurrently.
>>
> More information about the calltrace you've met and how to reproduce this?
> I'm not sure why other drivers are involved.
>
>> As John Garry's opinion, cpuhp_state_remove_instance() is used for shared
>> interrupt, and using cpuhp_state_remove_instance_nocalls() is fine for PCIe
>> or HNS3 pmu.
>>
> I'm a bit confused here. We need to update the using CPU and migrate the perf
> context as well as the interrupt affinity in cpuhp::teardown() callback, so
> it make sense to not call this on driver detachment. But I cannot figure
> out why this is related to the shared interrupt, more details?
>
ok,I will send v2 to add more details.
Thanks

Re: [PATCH drivers/perf: hisi:] drivers/perf: hisi: fix NULL pointer issue when uninstall hns3 pmu driver
Posted by Robin Murphy 2 years, 2 months ago
On 11/10/2023 9:37 am, Jijie Shao wrote:
> 
> on 2023/10/10 17:32, Yicong Yang wrote:
>> Hi Jijie,
>>
>> On 2023/10/9 18:50, Jijie Shao wrote:
>>> From: Hao Chen <chenhao418@huawei.com>
>>>
>>> When uninstall hns3 pmu driver, it will call 
>>> cpuhp_state_remove_instance()
>>> and then callback function hns3_pmu_offline_cpu() is called, it may 
>>> cause
>>> NULL pointer call trace when other driver is installing or uninstalling
>>> concurrently.
>>>
>> More information about the calltrace you've met and how to reproduce 
>> this?
>> I'm not sure why other drivers are involved.
>>
>>> As John Garry's opinion, cpuhp_state_remove_instance() is used for 
>>> shared
>>> interrupt, and using cpuhp_state_remove_instance_nocalls() is fine 
>>> for PCIe
>>> or HNS3 pmu.
>>>
>> I'm a bit confused here. We need to update the using CPU and migrate 
>> the perf
>> context as well as the interrupt affinity in cpuhp::teardown() 
>> callback, so
>> it make sense to not call this on driver detachment. But I cannot figure
>> out why this is related to the shared interrupt, more details?
>>
> ok,I will send v2 to add more details.

This shouldn't have anything to do with concurrency or shared interrupts 
or anything else. It's simply that we should clearly not attempt to 
migrate a PMU context (via invoking the hotplug callbacks) *after* the 
relevant PMU has already been unregistered, since that's liable to lead 
to some kind of use-after-free, and at best it's just a pointless waste 
of time anyway - if we've got to the point of unbinding the driver (or 
failing to probe at all), there should definitely not be any active 
events or other PMU state that needs updating.

Thanks,
Robin.
Re: [PATCH drivers/perf: hisi:] drivers/perf: hisi: fix NULL pointer issue when uninstall hns3 pmu driver
Posted by Yicong Yang 2 years, 2 months ago
Hi Robin,

On 2023/10/12 0:03, Robin Murphy wrote:
> On 11/10/2023 9:37 am, Jijie Shao wrote:
>>
>> on 2023/10/10 17:32, Yicong Yang wrote:
>>> Hi Jijie,
>>>
>>> On 2023/10/9 18:50, Jijie Shao wrote:
>>>> From: Hao Chen <chenhao418@huawei.com>
>>>>
>>>> When uninstall hns3 pmu driver, it will call cpuhp_state_remove_instance()
>>>> and then callback function hns3_pmu_offline_cpu() is called, it may cause
>>>> NULL pointer call trace when other driver is installing or uninstalling
>>>> concurrently.
>>>>
>>> More information about the calltrace you've met and how to reproduce this?
>>> I'm not sure why other drivers are involved.
>>>
>>>> As John Garry's opinion, cpuhp_state_remove_instance() is used for shared
>>>> interrupt, and using cpuhp_state_remove_instance_nocalls() is fine for PCIe
>>>> or HNS3 pmu.
>>>>
>>> I'm a bit confused here. We need to update the using CPU and migrate the perf
>>> context as well as the interrupt affinity in cpuhp::teardown() callback, so
>>> it make sense to not call this on driver detachment. But I cannot figure
>>> out why this is related to the shared interrupt, more details?
>>>
>> ok,I will send v2 to add more details.
> 
> This shouldn't have anything to do with concurrency or shared interrupts or anything else. It's simply that we should clearly not attempt to migrate a PMU context (via invoking the hotplug callbacks) *after* the relevant PMU has already been unregistered, since that's liable to lead to some kind of use-after-free, and at best it's just a pointless waste of time anyway - if we've got to the point of unbinding the driver (or failing to probe at all), there should definitely not be any active events or other PMU state that needs updating.
> 

Thanks for the clarification. I think this is the root reason
of the problem met on the hns3 pmu.

Thanks.