[PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM

Hans Zhang posted 1 patch 23 hours ago
drivers/pci/controller/dwc/pcie-tegra194.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
[PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Hans Zhang 23 hours ago
When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally call
debugfs_remove_recursive(), leading to potential NULL pointer operations.

Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
stubbed for !CONFIG_PCIEASPM. Use this function during removal/shutdown to
ensure debugfs cleanup only occurs when entries were initialized.

This prevents kernel warnings and instability when ASPM support is
disabled.

Signed-off-by: Hans Zhang <18255117159@163.com>
---
 drivers/pci/controller/dwc/pcie-tegra194.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 5103995cd6c7..d762e733c2d8 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -716,11 +716,20 @@ static void init_debugfs(struct tegra_pcie_dw *pcie)
 	debugfs_create_devm_seqfile(pcie->dev, "aspm_state_cnt", pcie->debugfs,
 				    aspm_state_cnt);
 }
+
+static void deinit_debugfs(struct tegra_pcie_dw *pcie)
+{
+	if (!pcie->debugfs)
+		return;
+
+	debugfs_remove_recursive(pcie->debugfs);
+}
 #else
 static inline void disable_aspm_l12(struct tegra_pcie_dw *pcie) { return; }
 static inline void disable_aspm_l11(struct tegra_pcie_dw *pcie) { return; }
 static inline void init_host_aspm(struct tegra_pcie_dw *pcie) { return; }
 static inline void init_debugfs(struct tegra_pcie_dw *pcie) { return; }
+static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
 #endif
 
 static void tegra_pcie_enable_system_interrupts(struct dw_pcie_rp *pp)
@@ -2289,7 +2298,7 @@ static void tegra_pcie_dw_remove(struct platform_device *pdev)
 		if (!pcie->link_state)
 			return;
 
-		debugfs_remove_recursive(pcie->debugfs);
+		deinit_debugfs(pcie->debugfs);
 		tegra_pcie_deinit_controller(pcie);
 		pm_runtime_put_sync(pcie->dev);
 	} else {
@@ -2408,7 +2417,7 @@ static void tegra_pcie_dw_shutdown(struct platform_device *pdev)
 		if (!pcie->link_state)
 			return;
 
-		debugfs_remove_recursive(pcie->debugfs);
+		deinit_debugfs(pcie->debugfs);
 		tegra_pcie_downstream_dev_to_D0(pcie);
 
 		disable_irq(pcie->pci.pp.irq);

base-commit: a8662bcd2ff152bfbc751cab20f33053d74d0963
-- 
2.25.1
Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by kernel test robot 12 hours ago
Hi Hans,

kernel test robot noticed the following build errors:

[auto build test ERROR on a8662bcd2ff152bfbc751cab20f33053d74d0963]

url:    https://github.com/intel-lab-lkp/linux/commits/Hans-Zhang/pci-tegra194-Fix-debugfs-cleanup-for-CONFIG_PCIEASPM/20250405-230047
base:   a8662bcd2ff152bfbc751cab20f33053d74d0963
patch link:    https://lore.kernel.org/r/20250405145459.26800-1-18255117159%40163.com
patch subject: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
config: arm-randconfig-001-20250406 (https://download.01.org/0day-ci/archive/20250406/202504060938.xa7VrE6O-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 92c93f5286b9ff33f27ff694d2dc33da1c07afdd)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250406/202504060938.xa7VrE6O-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504060938.xa7VrE6O-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/pci/controller/dwc/pcie-tegra194.c:2301:18: error: incompatible pointer types passing 'struct dentry *' to parameter of type 'struct tegra_pcie_dw *' [-Werror,-Wincompatible-pointer-types]
    2301 |                 deinit_debugfs(pcie->debugfs);
         |                                ^~~~~~~~~~~~~
   drivers/pci/controller/dwc/pcie-tegra194.c:732:57: note: passing argument to parameter 'pcie' here
     732 | static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
         |                                                         ^
   drivers/pci/controller/dwc/pcie-tegra194.c:2420:18: error: incompatible pointer types passing 'struct dentry *' to parameter of type 'struct tegra_pcie_dw *' [-Werror,-Wincompatible-pointer-types]
    2420 |                 deinit_debugfs(pcie->debugfs);
         |                                ^~~~~~~~~~~~~
   drivers/pci/controller/dwc/pcie-tegra194.c:732:57: note: passing argument to parameter 'pcie' here
     732 | static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
         |                                                         ^
   2 errors generated.


vim +2301 drivers/pci/controller/dwc/pcie-tegra194.c

  2292	
  2293	static void tegra_pcie_dw_remove(struct platform_device *pdev)
  2294	{
  2295		struct tegra_pcie_dw *pcie = platform_get_drvdata(pdev);
  2296	
  2297		if (pcie->of_data->mode == DW_PCIE_RC_TYPE) {
  2298			if (!pcie->link_state)
  2299				return;
  2300	
> 2301			deinit_debugfs(pcie->debugfs);
  2302			tegra_pcie_deinit_controller(pcie);
  2303			pm_runtime_put_sync(pcie->dev);
  2304		} else {
  2305			disable_irq(pcie->pex_rst_irq);
  2306			pex_ep_event_pex_rst_assert(pcie);
  2307		}
  2308	
  2309		pm_runtime_disable(pcie->dev);
  2310		tegra_bpmp_put(pcie->bpmp);
  2311		if (pcie->pex_refclk_sel_gpiod)
  2312			gpiod_set_value(pcie->pex_refclk_sel_gpiod, 0);
  2313	}
  2314	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by kernel test robot 13 hours ago
Hi Hans,

kernel test robot noticed the following build errors:

[auto build test ERROR on a8662bcd2ff152bfbc751cab20f33053d74d0963]

url:    https://github.com/intel-lab-lkp/linux/commits/Hans-Zhang/pci-tegra194-Fix-debugfs-cleanup-for-CONFIG_PCIEASPM/20250405-230047
base:   a8662bcd2ff152bfbc751cab20f33053d74d0963
patch link:    https://lore.kernel.org/r/20250405145459.26800-1-18255117159%40163.com
patch subject: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
config: arm-randconfig-002-20250406 (https://download.01.org/0day-ci/archive/20250406/202504060814.4HRuwajf-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250406/202504060814.4HRuwajf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504060814.4HRuwajf-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/pci/controller/dwc/pcie-tegra194.c: In function 'tegra_pcie_dw_remove':
>> drivers/pci/controller/dwc/pcie-tegra194.c:2301:18: error: passing argument 1 of 'deinit_debugfs' from incompatible pointer type [-Werror=incompatible-pointer-types]
      deinit_debugfs(pcie->debugfs);
                     ^~~~
   drivers/pci/controller/dwc/pcie-tegra194.c:732:20: note: expected 'struct tegra_pcie_dw *' but argument is of type 'struct dentry *'
    static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
                       ^~~~~~~~~~~~~~
   drivers/pci/controller/dwc/pcie-tegra194.c: In function 'tegra_pcie_dw_shutdown':
   drivers/pci/controller/dwc/pcie-tegra194.c:2420:18: error: passing argument 1 of 'deinit_debugfs' from incompatible pointer type [-Werror=incompatible-pointer-types]
      deinit_debugfs(pcie->debugfs);
                     ^~~~
   drivers/pci/controller/dwc/pcie-tegra194.c:732:20: note: expected 'struct tegra_pcie_dw *' but argument is of type 'struct dentry *'
    static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
                       ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/deinit_debugfs +2301 drivers/pci/controller/dwc/pcie-tegra194.c

  2292	
  2293	static void tegra_pcie_dw_remove(struct platform_device *pdev)
  2294	{
  2295		struct tegra_pcie_dw *pcie = platform_get_drvdata(pdev);
  2296	
  2297		if (pcie->of_data->mode == DW_PCIE_RC_TYPE) {
  2298			if (!pcie->link_state)
  2299				return;
  2300	
> 2301			deinit_debugfs(pcie->debugfs);
  2302			tegra_pcie_deinit_controller(pcie);
  2303			pm_runtime_put_sync(pcie->dev);
  2304		} else {
  2305			disable_irq(pcie->pex_rst_irq);
  2306			pex_ep_event_pex_rst_assert(pcie);
  2307		}
  2308	
  2309		pm_runtime_disable(pcie->dev);
  2310		tegra_bpmp_put(pcie->bpmp);
  2311		if (pcie->pex_refclk_sel_gpiod)
  2312			gpiod_set_value(pcie->pex_refclk_sel_gpiod, 0);
  2313	}
  2314	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Christophe JAILLET 22 hours ago
Le 05/04/2025 à 16:54, Hans Zhang a écrit :
> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally call
> debugfs_remove_recursive(), leading to potential NULL pointer operations.
> 
> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
> stubbed for !CONFIG_PCIEASPM. Use this function during removal/shutdown to
> ensure debugfs cleanup only occurs when entries were initialized.
> 
> This prevents kernel warnings and instability when ASPM support is
> disabled.
> 

Could you elaborate?


debugfs_remove_recursive() ends either to:

static inline void debugfs_remove(struct dentry *dentry)
{ }
if CONFIG_DEBUG_FS is not set,

or
to a function which starts with:
	if (IS_ERR_OR_NULL(dentry))
		return;
if it is set.


So what does this new deinit_debugfs() add?


Which NULL pointer are you seeing?
Did you actually manage to trigger it?

CJ



> Signed-off-by: Hans Zhang <18255117159-9Onoh4P/yGk@public.gmane.org>
> ---
>   drivers/pci/controller/dwc/pcie-tegra194.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 5103995cd6c7..d762e733c2d8 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -716,11 +716,20 @@ static void init_debugfs(struct tegra_pcie_dw *pcie)
>   	debugfs_create_devm_seqfile(pcie->dev, "aspm_state_cnt", pcie->debugfs,
>   				    aspm_state_cnt);
>   }
> +
> +static void deinit_debugfs(struct tegra_pcie_dw *pcie)
> +{
> +	if (!pcie->debugfs)
> +		return;
> +
> +	debugfs_remove_recursive(pcie->debugfs);
> +}
>   #else
>   static inline void disable_aspm_l12(struct tegra_pcie_dw *pcie) { return; }
>   static inline void disable_aspm_l11(struct tegra_pcie_dw *pcie) { return; }
>   static inline void init_host_aspm(struct tegra_pcie_dw *pcie) { return; }
>   static inline void init_debugfs(struct tegra_pcie_dw *pcie) { return; }
> +static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
>   #endif
>   
>   static void tegra_pcie_enable_system_interrupts(struct dw_pcie_rp *pp)
> @@ -2289,7 +2298,7 @@ static void tegra_pcie_dw_remove(struct platform_device *pdev)
>   		if (!pcie->link_state)
>   			return;
>   
> -		debugfs_remove_recursive(pcie->debugfs);
> +		deinit_debugfs(pcie->debugfs);
>   		tegra_pcie_deinit_controller(pcie);
>   		pm_runtime_put_sync(pcie->dev);
>   	} else {
> @@ -2408,7 +2417,7 @@ static void tegra_pcie_dw_shutdown(struct platform_device *pdev)
>   		if (!pcie->link_state)
>   			return;
>   
> -		debugfs_remove_recursive(pcie->debugfs);
> +		deinit_debugfs(pcie->debugfs);
>   		tegra_pcie_downstream_dev_to_D0(pcie);
>   
>   		disable_irq(pcie->pci.pp.irq);
> 
> base-commit: a8662bcd2ff152bfbc751cab20f33053d74d0963

Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Hans Zhang 21 hours ago

On 2025/4/6 00:14, Christophe JAILLET wrote:
> Le 05/04/2025 à 16:54, Hans Zhang a écrit :
>> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally call
>> debugfs_remove_recursive(), leading to potential NULL pointer operations.
>>
>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
>> stubbed for !CONFIG_PCIEASPM. Use this function during 
>> removal/shutdown to
>> ensure debugfs cleanup only occurs when entries were initialized.
>>
>> This prevents kernel warnings and instability when ASPM support is
>> disabled.
>>
> 
> Could you elaborate?
> 
> 
> debugfs_remove_recursive() ends either to:
> 
> static inline void debugfs_remove(struct dentry *dentry)
> { }
> if CONFIG_DEBUG_FS is not set,
> 
> or
> to a function which starts with:
>      if (IS_ERR_OR_NULL(dentry))
>          return;
> if it is set.
> 
> 
> So what does this new deinit_debugfs() add?
> 
> 
> Which NULL pointer are you seeing?
> Did you actually manage to trigger it?
> 


Hi Christophe,

You're right, and I'm sorry about that.

The following line of code only makes sense if the #if 
defined(CONFIG_PCIEASPM) condition holds. Do we need to optimize this?

pcie->debugfs = debugfs_create_dir(name, NULL);

Best regards,
Hans

Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Bjorn Helgaas 23 hours ago
Follow subject line capitalization convention.

On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally call
> debugfs_remove_recursive(), leading to potential NULL pointer operations.
> 
> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
> stubbed for !CONFIG_PCIEASPM. Use this function during removal/shutdown to
> ensure debugfs cleanup only occurs when entries were initialized.
> 
> This prevents kernel warnings and instability when ASPM support is
> disabled.

This looks like there should be a Fixes: tag to connect this to the
commit that introduced the problem.

If this is something that broke with the v6.15 merge window, we should
include this in v6.15 via pci/for-linus.  If this broke earlier, we
would have to decide whether pci/for-linus is still appropriate or a
stable tag.

We did merge some debugfs things for v6.15, but I don't see anything
specific to pcie-tegra194.c, so I'm confused about why this fix would
be in pcie-tegra194.c instead of some more generic place.

> Signed-off-by: Hans Zhang <18255117159@163.com>
> ---
>  drivers/pci/controller/dwc/pcie-tegra194.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 5103995cd6c7..d762e733c2d8 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -716,11 +716,20 @@ static void init_debugfs(struct tegra_pcie_dw *pcie)
>  	debugfs_create_devm_seqfile(pcie->dev, "aspm_state_cnt", pcie->debugfs,
>  				    aspm_state_cnt);
>  }
> +
> +static void deinit_debugfs(struct tegra_pcie_dw *pcie)
> +{
> +	if (!pcie->debugfs)
> +		return;
> +
> +	debugfs_remove_recursive(pcie->debugfs);
> +}
>  #else
>  static inline void disable_aspm_l12(struct tegra_pcie_dw *pcie) { return; }
>  static inline void disable_aspm_l11(struct tegra_pcie_dw *pcie) { return; }
>  static inline void init_host_aspm(struct tegra_pcie_dw *pcie) { return; }
>  static inline void init_debugfs(struct tegra_pcie_dw *pcie) { return; }
> +static inline void deinit_debugfs(struct tegra_pcie_dw *pcie) { return; }
>  #endif
>  
>  static void tegra_pcie_enable_system_interrupts(struct dw_pcie_rp *pp)
> @@ -2289,7 +2298,7 @@ static void tegra_pcie_dw_remove(struct platform_device *pdev)
>  		if (!pcie->link_state)
>  			return;
>  
> -		debugfs_remove_recursive(pcie->debugfs);
> +		deinit_debugfs(pcie->debugfs);
>  		tegra_pcie_deinit_controller(pcie);
>  		pm_runtime_put_sync(pcie->dev);
>  	} else {
> @@ -2408,7 +2417,7 @@ static void tegra_pcie_dw_shutdown(struct platform_device *pdev)
>  		if (!pcie->link_state)
>  			return;
>  
> -		debugfs_remove_recursive(pcie->debugfs);
> +		deinit_debugfs(pcie->debugfs);
>  		tegra_pcie_downstream_dev_to_D0(pcie);
>  
>  		disable_irq(pcie->pci.pp.irq);
> 
> base-commit: a8662bcd2ff152bfbc751cab20f33053d74d0963
> -- 
> 2.25.1
>
Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Hans Zhang 22 hours ago

On 2025/4/5 23:28, Bjorn Helgaas wrote:
> Follow subject line capitalization convention.
> 
> On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
>> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally call
>> debugfs_remove_recursive(), leading to potential NULL pointer operations.
>>
>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
>> stubbed for !CONFIG_PCIEASPM. Use this function during removal/shutdown to
>> ensure debugfs cleanup only occurs when entries were initialized.
>>
>> This prevents kernel warnings and instability when ASPM support is
>> disabled.
> 
> This looks like there should be a Fixes: tag to connect this to the
> commit that introduced the problem.

Hi Bjorn,

Thanks your for reply. Will add.

Fixes: bb617cbd8151 (PCI: tegra194: Clean up the exit path for Endpoint 
mode)

> 
> If this is something that broke with the v6.15 merge window, we should
> include this in v6.15 via pci/for-linus.  If this broke earlier, we
> would have to decide whether pci/for-linus is still appropriate or a
> stable tag.
> 

The original code that introduced the unconditional 
`debugfs_remove_recursive()` calls was actually merged in an earlier cycle.

> We did merge some debugfs things for v6.15, but I don't see anything
> specific to pcie-tegra194.c, so I'm confused about why this fix would
> be in pcie-tegra194.c instead of some more generic place.
> 

The Tegra194 driver conditionally initializes pcie->debugfs based on 
CONFIG_PCIEASPM. When ASPM is disabled, pcie->debugfs remains 
uninitialized, but tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() 
unconditionally call debugfs_remove_recursive(), leading to a NULL 
pointer dereference. This is specific to the Tegra194 implementation, as 
other drivers or core PCI code may already guard debugfs cleanup against 
uninitialized states through different mechanisms.

Best regards,
Hans
Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Christophe JAILLET 22 hours ago
Le 05/04/2025 à 17:49, Hans Zhang a écrit :
> 
> 
> On 2025/4/5 23:28, Bjorn Helgaas wrote:
>> Follow subject line capitalization convention.
>>
>> On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
>>> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
>>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally call
>>> debugfs_remove_recursive(), leading to potential NULL pointer 
>>> operations.
>>>
>>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
>>> stubbed for !CONFIG_PCIEASPM. Use this function during removal/ 
>>> shutdown to
>>> ensure debugfs cleanup only occurs when entries were initialized.
>>>
>>> This prevents kernel warnings and instability when ASPM support is
>>> disabled.
>>
>> This looks like there should be a Fixes: tag to connect this to the
>> commit that introduced the problem.
> 
> Hi Bjorn,
> 
> Thanks your for reply. Will add.
> 
> Fixes: bb617cbd8151 (PCI: tegra194: Clean up the exit path for Endpoint 
> mode)
> 
>>
>> If this is something that broke with the v6.15 merge window, we should
>> include this in v6.15 via pci/for-linus.  If this broke earlier, we
>> would have to decide whether pci/for-linus is still appropriate or a
>> stable tag.
>>
> 
> The original code that introduced the unconditional 
> `debugfs_remove_recursive()` calls was actually merged in an earlier cycle.
> 
>> We did merge some debugfs things for v6.15, but I don't see anything
>> specific to pcie-tegra194.c, so I'm confused about why this fix would
>> be in pcie-tegra194.c instead of some more generic place.
>>
> 
> The Tegra194 driver conditionally initializes pcie->debugfs based on 
> CONFIG_PCIEASPM. When ASPM is disabled, pcie->debugfs remains 
> uninitialized, but tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() 
> unconditionally call debugfs_remove_recursive(), leading to a NULL 

debugfs IS initialized, because it is in a structure allocated with 
devm_kzalloc().

And debugfs functions handle such cases.

CJ

> pointer dereference. This is specific to the Tegra194 implementation, as 
> other drivers or core PCI code may already guard debugfs cleanup against 
> uninitialized states through different mechanisms.
> 
> Best regards,
> Hans
> 
> 
> 

Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Hans Zhang 22 hours ago

On 2025/4/6 00:17, Christophe JAILLET wrote:
> Le 05/04/2025 à 17:49, Hans Zhang a écrit :
>>
>>
>> On 2025/4/5 23:28, Bjorn Helgaas wrote:
>>> Follow subject line capitalization convention.
>>>
>>> On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
>>>> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
>>>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally 
>>>> call
>>>> debugfs_remove_recursive(), leading to potential NULL pointer 
>>>> operations.
>>>>
>>>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), which is
>>>> stubbed for !CONFIG_PCIEASPM. Use this function during removal/ 
>>>> shutdown to
>>>> ensure debugfs cleanup only occurs when entries were initialized.
>>>>
>>>> This prevents kernel warnings and instability when ASPM support is
>>>> disabled.
>>>
>>> This looks like there should be a Fixes: tag to connect this to the
>>> commit that introduced the problem.
>>
>> Hi Bjorn,
>>
>> Thanks your for reply. Will add.
>>
>> Fixes: bb617cbd8151 (PCI: tegra194: Clean up the exit path for 
>> Endpoint mode)
>>
>>>
>>> If this is something that broke with the v6.15 merge window, we should
>>> include this in v6.15 via pci/for-linus.  If this broke earlier, we
>>> would have to decide whether pci/for-linus is still appropriate or a
>>> stable tag.
>>>
>>
>> The original code that introduced the unconditional 
>> `debugfs_remove_recursive()` calls was actually merged in an earlier 
>> cycle.
>>
>>> We did merge some debugfs things for v6.15, but I don't see anything
>>> specific to pcie-tegra194.c, so I'm confused about why this fix would
>>> be in pcie-tegra194.c instead of some more generic place.
>>>
>>
>> The Tegra194 driver conditionally initializes pcie->debugfs based on 
>> CONFIG_PCIEASPM. When ASPM is disabled, pcie->debugfs remains 
>> uninitialized, but tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() 
>> unconditionally call debugfs_remove_recursive(), leading to a NULL 
> 
> debugfs IS initialized, because it is in a structure allocated with 
> devm_kzalloc().
> 
> And debugfs functions handle such cases.
> 

Oh, my mind went wrong and I didn't pay attention to devm, and I'm 
really sorry about that.

Another problem I noticed here is that currently, no matter what, 
pcie->debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
defined(CONFIG_PCIEASPM) is valid, then pcie->debugfs = 
debugfs_create_dir(name, NULL); Is it superfluous?

Best regards,
Hans

Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Hans Zhang 21 hours ago

On 2025/4/6 00:35, Hans Zhang wrote:
> 
> 
> On 2025/4/6 00:17, Christophe JAILLET wrote:
>> Le 05/04/2025 à 17:49, Hans Zhang a écrit :
>>>
>>>
>>> On 2025/4/5 23:28, Bjorn Helgaas wrote:
>>>> Follow subject line capitalization convention.
>>>>
>>>> On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
>>>>> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, but
>>>>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() unconditionally 
>>>>> call
>>>>> debugfs_remove_recursive(), leading to potential NULL pointer 
>>>>> operations.
>>>>>
>>>>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), 
>>>>> which is
>>>>> stubbed for !CONFIG_PCIEASPM. Use this function during removal/ 
>>>>> shutdown to
>>>>> ensure debugfs cleanup only occurs when entries were initialized.
>>>>>
>>>>> This prevents kernel warnings and instability when ASPM support is
>>>>> disabled.
>>>>
>>>> This looks like there should be a Fixes: tag to connect this to the
>>>> commit that introduced the problem.
>>>
>>> Hi Bjorn,
>>>
>>> Thanks your for reply. Will add.
>>>
>>> Fixes: bb617cbd8151 (PCI: tegra194: Clean up the exit path for 
>>> Endpoint mode)
>>>
>>>>
>>>> If this is something that broke with the v6.15 merge window, we should
>>>> include this in v6.15 via pci/for-linus.  If this broke earlier, we
>>>> would have to decide whether pci/for-linus is still appropriate or a
>>>> stable tag.
>>>>
>>>
>>> The original code that introduced the unconditional 
>>> `debugfs_remove_recursive()` calls was actually merged in an earlier 
>>> cycle.
>>>
>>>> We did merge some debugfs things for v6.15, but I don't see anything
>>>> specific to pcie-tegra194.c, so I'm confused about why this fix would
>>>> be in pcie-tegra194.c instead of some more generic place.
>>>>
>>>
>>> The Tegra194 driver conditionally initializes pcie->debugfs based on 
>>> CONFIG_PCIEASPM. When ASPM is disabled, pcie->debugfs remains 
>>> uninitialized, but tegra_pcie_dw_remove() and 
>>> tegra_pcie_dw_shutdown() unconditionally call 
>>> debugfs_remove_recursive(), leading to a NULL 
>>
>> debugfs IS initialized, because it is in a structure allocated with 
>> devm_kzalloc().
>>
>> And debugfs functions handle such cases.
>>
> 
> Oh, my mind went wrong and I didn't pay attention to devm, and I'm 
> really sorry about that.
> 
> Another problem I noticed here is that currently, no matter what, 
> pcie->debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
> defined(CONFIG_PCIEASPM) is valid, then pcie->debugfs = 
> debugfs_create_dir(name, NULL); Is it superfluous?

Sorry, let me reply again:
Another problem I noticed here is that currently, no matter what, 
pcie->debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
defined(CONFIG_PCIEASPM) is invalid, then pcie->debugfs = 
debugfs_create_dir(name, NULL); Is it superfluous?

Best regards,
Hans

Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Christophe JAILLET 21 hours ago
Le 05/04/2025 à 18:47, Hans Zhang a écrit :
> 
> 
> On 2025/4/6 00:35, Hans Zhang wrote:
>>
>>
>> On 2025/4/6 00:17, Christophe JAILLET wrote:
>>> Le 05/04/2025 à 17:49, Hans Zhang a écrit :
>>>>
>>>>
>>>> On 2025/4/5 23:28, Bjorn Helgaas wrote:
>>>>> Follow subject line capitalization convention.
>>>>>
>>>>> On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
>>>>>> When CONFIG_PCIEASPM is disabled, debugfs entries are not created, 
>>>>>> but
>>>>>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() 
>>>>>> unconditionally call
>>>>>> debugfs_remove_recursive(), leading to potential NULL pointer 
>>>>>> operations.
>>>>>>
>>>>>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), 
>>>>>> which is
>>>>>> stubbed for !CONFIG_PCIEASPM. Use this function during removal/ 
>>>>>> shutdown to
>>>>>> ensure debugfs cleanup only occurs when entries were initialized.
>>>>>>
>>>>>> This prevents kernel warnings and instability when ASPM support is
>>>>>> disabled.
>>>>>
>>>>> This looks like there should be a Fixes: tag to connect this to the
>>>>> commit that introduced the problem.
>>>>
>>>> Hi Bjorn,
>>>>
>>>> Thanks your for reply. Will add.
>>>>
>>>> Fixes: bb617cbd8151 (PCI: tegra194: Clean up the exit path for 
>>>> Endpoint mode)
>>>>
>>>>>
>>>>> If this is something that broke with the v6.15 merge window, we should
>>>>> include this in v6.15 via pci/for-linus.  If this broke earlier, we
>>>>> would have to decide whether pci/for-linus is still appropriate or a
>>>>> stable tag.
>>>>>
>>>>
>>>> The original code that introduced the unconditional 
>>>> `debugfs_remove_recursive()` calls was actually merged in an earlier 
>>>> cycle.
>>>>
>>>>> We did merge some debugfs things for v6.15, but I don't see anything
>>>>> specific to pcie-tegra194.c, so I'm confused about why this fix would
>>>>> be in pcie-tegra194.c instead of some more generic place.
>>>>>
>>>>
>>>> The Tegra194 driver conditionally initializes pcie->debugfs based on 
>>>> CONFIG_PCIEASPM. When ASPM is disabled, pcie->debugfs remains 
>>>> uninitialized, but tegra_pcie_dw_remove() and 
>>>> tegra_pcie_dw_shutdown() unconditionally call 
>>>> debugfs_remove_recursive(), leading to a NULL 
>>>
>>> debugfs IS initialized, because it is in a structure allocated with 
>>> devm_kzalloc().
>>>
>>> And debugfs functions handle such cases.
>>>
>>
>> Oh, my mind went wrong and I didn't pay attention to devm, and I'm 

Here, what is relevant in devm_kzalloc() is not devm but the kzalloc 
part. the "z" is for zeroing the allocated memory.

>> really sorry about that.
>>
>> Another problem I noticed here is that currently, no matter what, 
>> pcie->debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
>> defined(CONFIG_PCIEASPM) is valid, then pcie->debugfs = 
>> debugfs_create_dir(name, NULL); Is it superfluous?
> 
> Sorry, let me reply again:
> Another problem I noticed here is that currently, no matter what, pcie- 
>  >debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
> defined(CONFIG_PCIEASPM) is invalid, then pcie->debugfs = 
> debugfs_create_dir(name, NULL); Is it superfluous?

AFAICT, it looks useless in this case.

I guess that moving debugfs_create_dir() and the
name = devm_kasprintf()...
above it, into init_debugfs() would do the trick.

CJ

> 
> Best regards,
> Hans
> 
> 
> 

Re: [PATCH] pci: tegra194: Fix debugfs cleanup for !CONFIG_PCIEASPM
Posted by Hans Zhang 21 hours ago

On 2025/4/6 01:04, Christophe JAILLET wrote:
> Le 05/04/2025 à 18:47, Hans Zhang a écrit :
>>
>>
>> On 2025/4/6 00:35, Hans Zhang wrote:
>>>
>>>
>>> On 2025/4/6 00:17, Christophe JAILLET wrote:
>>>> Le 05/04/2025 à 17:49, Hans Zhang a écrit :
>>>>>
>>>>>
>>>>> On 2025/4/5 23:28, Bjorn Helgaas wrote:
>>>>>> Follow subject line capitalization convention.
>>>>>>
>>>>>> On Sat, Apr 05, 2025 at 10:54:59PM +0800, Hans Zhang wrote:
>>>>>>> When CONFIG_PCIEASPM is disabled, debugfs entries are not 
>>>>>>> created, but
>>>>>>> tegra_pcie_dw_remove() and tegra_pcie_dw_shutdown() 
>>>>>>> unconditionally call
>>>>>>> debugfs_remove_recursive(), leading to potential NULL pointer 
>>>>>>> operations.
>>>>>>>
>>>>>>> Introduce deinit_debugfs() to wrap debugfs_remove_recursive(), 
>>>>>>> which is
>>>>>>> stubbed for !CONFIG_PCIEASPM. Use this function during removal/ 
>>>>>>> shutdown to
>>>>>>> ensure debugfs cleanup only occurs when entries were initialized.
>>>>>>>
>>>>>>> This prevents kernel warnings and instability when ASPM support is
>>>>>>> disabled.
>>>>>>
>>>>>> This looks like there should be a Fixes: tag to connect this to the
>>>>>> commit that introduced the problem.
>>>>>
>>>>> Hi Bjorn,
>>>>>
>>>>> Thanks your for reply. Will add.
>>>>>
>>>>> Fixes: bb617cbd8151 (PCI: tegra194: Clean up the exit path for 
>>>>> Endpoint mode)
>>>>>
>>>>>>
>>>>>> If this is something that broke with the v6.15 merge window, we 
>>>>>> should
>>>>>> include this in v6.15 via pci/for-linus.  If this broke earlier, we
>>>>>> would have to decide whether pci/for-linus is still appropriate or a
>>>>>> stable tag.
>>>>>>
>>>>>
>>>>> The original code that introduced the unconditional 
>>>>> `debugfs_remove_recursive()` calls was actually merged in an 
>>>>> earlier cycle.
>>>>>
>>>>>> We did merge some debugfs things for v6.15, but I don't see anything
>>>>>> specific to pcie-tegra194.c, so I'm confused about why this fix would
>>>>>> be in pcie-tegra194.c instead of some more generic place.
>>>>>>
>>>>>
>>>>> The Tegra194 driver conditionally initializes pcie->debugfs based 
>>>>> on CONFIG_PCIEASPM. When ASPM is disabled, pcie->debugfs remains 
>>>>> uninitialized, but tegra_pcie_dw_remove() and 
>>>>> tegra_pcie_dw_shutdown() unconditionally call 
>>>>> debugfs_remove_recursive(), leading to a NULL 
>>>>
>>>> debugfs IS initialized, because it is in a structure allocated with 
>>>> devm_kzalloc().
>>>>
>>>> And debugfs functions handle such cases.
>>>>
>>>
>>> Oh, my mind went wrong and I didn't pay attention to devm, and I'm 
> 
> Here, what is relevant in devm_kzalloc() is not devm but the kzalloc 
> part. the "z" is for zeroing the allocated memory.
> 

Hi Christophe,

Thanks your for reply. I understand.

>>> really sorry about that.
>>>
>>> Another problem I noticed here is that currently, no matter what, 
>>> pcie->debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
>>> defined(CONFIG_PCIEASPM) is valid, then pcie->debugfs = 
>>> debugfs_create_dir(name, NULL); Is it superfluous?
>>
>> Sorry, let me reply again:
>> Another problem I noticed here is that currently, no matter what, 
>> pcie-  >debugfs = debugfs_create_dir(name, NULL) is executed; if #if 
>> defined(CONFIG_PCIEASPM) is invalid, then pcie->debugfs = 
>> debugfs_create_dir(name, NULL); Is it superfluous?
> 
> AFAICT, it looks useless in this case.
> 
> I guess that moving debugfs_create_dir() and the
> name = devm_kasprintf()...
> above it, into init_debugfs() would do the trick.

That's what I was thinking. I will submit the version again in the future.

Best regards,
Hans