[PATCH] drivers/base/node: Handle error properly in register_one_node()

Donet Tom posted 1 patch 7 months, 1 week ago
There is a newer version of this series
drivers/base/node.c | 2 ++
1 file changed, 2 insertions(+)
[PATCH] drivers/base/node: Handle error properly in register_one_node()
Posted by Donet Tom 7 months, 1 week ago
If register_node() returns an error, it is not handled correctly.
The function will proceed further and try to register CPUs under the
node, which is not correct.

So, in this patch, if register_node() returns an error, we return
immediately from the function.

Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---

This patch is based on the mm-unstable branch.

Fixes: 76b67ed9dce6 ("[PATCH] node hotplug: register cpu: remove node struct")

The issue has been present since the above commit, which is
quite old. Should I add a Fixes: tag and backport it to all
kernels that have this commit?
---
 drivers/base/node.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index bef84f01712f..aec991b4c0b2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -885,6 +885,8 @@ int register_one_node(int nid)
 	node_devices[nid] = node;
 
 	error = register_node(node_devices[nid], nid);
+	if (error)
+		return error;
 
 	/* link cpu under this node */
 	for_each_present_cpu(cpu) {
-- 
2.47.1
Re: [PATCH] drivers/base/node: Handle error properly in register_one_node()
Posted by Oscar Salvador 7 months, 1 week ago
On Wed, Jul 02, 2025 at 06:28:56AM -0500, Donet Tom wrote:
> If register_node() returns an error, it is not handled correctly.
> The function will proceed further and try to register CPUs under the
> node, which is not correct.
> 
> So, in this patch, if register_node() returns an error, we return
> immediately from the function.
> 
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
> 
... 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index bef84f01712f..aec991b4c0b2 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -885,6 +885,8 @@ int register_one_node(int nid)
>  	node_devices[nid] = node;
>  
>  	error = register_node(node_devices[nid], nid);
> +	if (error)
> +		return error;

Ok, all current callers (based on mm-unstable) panic or BUG() if this fails,
but powerpc, in init_phb_dynamic(), which keeps on going.
Unless it panics somewhere down the road as well.

So I think we need to: 

 node_devices[nid] = NULL
 kfree(node)

 ?

Also, once Hannes fix lands, we might need that as well.

Anyway, I'd suggest you hold on until Hannes fix lands, so we can later
rebase all your mem-hotplug on top of that [1].

[1] https://lore.kernel.org/linux-mm/86f89a65-f0f6-4462-9eea-ac691de2f3b6@suse.de/T/#mbf392eb390b8053f96be50da3b40dfd9b62dd389


-- 
Oscar Salvador
SUSE Labs
Re: [PATCH] drivers/base/node: Handle error properly in register_one_node()
Posted by Donet Tom 7 months, 1 week ago
On 7/2/25 6:16 PM, Oscar Salvador wrote:
> On Wed, Jul 02, 2025 at 06:28:56AM -0500, Donet Tom wrote:
>> If register_node() returns an error, it is not handled correctly.
>> The function will proceed further and try to register CPUs under the
>> node, which is not correct.
>>
>> So, in this patch, if register_node() returns an error, we return
>> immediately from the function.
>>
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> ---
>>
> ...
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index bef84f01712f..aec991b4c0b2 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -885,6 +885,8 @@ int register_one_node(int nid)
>>   	node_devices[nid] = node;
>>   
>>   	error = register_node(node_devices[nid], nid);
>> +	if (error)
>> +		return error;
> Ok, all current callers (based on mm-unstable) panic or BUG() if this fails,
> but powerpc, in init_phb_dynamic(), which keeps on going.
> Unless it panics somewhere down the road as well.
>
> So I think we need to:
>
>   node_devices[nid] = NULL
>   kfree(node)
>
>   ?


Yes, I will add this too.

But one question: if register_node() fails, is it okay to continue, or 
should we panic?

What is the correct way to handle this?


> Also, once Hannes fix lands, we might need that as well.
>
> Anyway, I'd suggest you hold on until Hannes fix lands, so we can later
> rebase all your mem-hotplug on top of that [1].

Sure


>
> [1] https://lore.kernel.org/linux-mm/86f89a65-f0f6-4462-9eea-ac691de2f3b6@suse.de/T/#mbf392eb390b8053f96be50da3b40dfd9b62dd389
>
>
Re: [PATCH] drivers/base/node: Handle error properly in register_one_node()
Posted by David Hildenbrand 7 months, 1 week ago
On 02.07.25 14:59, Donet Tom wrote:
> 
> On 7/2/25 6:16 PM, Oscar Salvador wrote:
>> On Wed, Jul 02, 2025 at 06:28:56AM -0500, Donet Tom wrote:
>>> If register_node() returns an error, it is not handled correctly.
>>> The function will proceed further and try to register CPUs under the
>>> node, which is not correct.
>>>
>>> So, in this patch, if register_node() returns an error, we return
>>> immediately from the function.
>>>
>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>> ---
>>>
>> ...
>>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>>> index bef84f01712f..aec991b4c0b2 100644
>>> --- a/drivers/base/node.c
>>> +++ b/drivers/base/node.c
>>> @@ -885,6 +885,8 @@ int register_one_node(int nid)
>>>    	node_devices[nid] = node;
>>>    
>>>    	error = register_node(node_devices[nid], nid);
>>> +	if (error)
>>> +		return error;
>> Ok, all current callers (based on mm-unstable) panic or BUG() if this fails,
>> but powerpc, in init_phb_dynamic(), which keeps on going.
>> Unless it panics somewhere down the road as well.
>>
>> So I think we need to:
>>
>>    node_devices[nid] = NULL
>>    kfree(node)
>>
>>    ?
> 
> 
> Yes, I will add this too.
> 
> But one question: if register_node() fails, is it okay to continue, or
> should we panic?
> 
> What is the correct way to handle this?

panic() or BUG() is not the answer :)

Try to recover ...

-- 
Cheers,

David / dhildenb
Re: [PATCH] drivers/base/node: Handle error properly in register_one_node()
Posted by Donet Tom 7 months ago
On 7/4/25 5:59 PM, David Hildenbrand wrote:
> On 02.07.25 14:59, Donet Tom wrote:
>>
>> On 7/2/25 6:16 PM, Oscar Salvador wrote:
>>> On Wed, Jul 02, 2025 at 06:28:56AM -0500, Donet Tom wrote:
>>>> If register_node() returns an error, it is not handled correctly.
>>>> The function will proceed further and try to register CPUs under the
>>>> node, which is not correct.
>>>>
>>>> So, in this patch, if register_node() returns an error, we return
>>>> immediately from the function.
>>>>
>>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>>> ---
>>>>
>>> ...
>>>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>>>> index bef84f01712f..aec991b4c0b2 100644
>>>> --- a/drivers/base/node.c
>>>> +++ b/drivers/base/node.c
>>>> @@ -885,6 +885,8 @@ int register_one_node(int nid)
>>>>        node_devices[nid] = node;
>>>>           error = register_node(node_devices[nid], nid);
>>>> +    if (error)
>>>> +        return error;
>>> Ok, all current callers (based on mm-unstable) panic or BUG() if 
>>> this fails,
>>> but powerpc, in init_phb_dynamic(), which keeps on going.
>>> Unless it panics somewhere down the road as well.
>>>
>>> So I think we need to:
>>>
>>>    node_devices[nid] = NULL
>>>    kfree(node)
>>>
>>>    ?
>>
>>
>> Yes, I will add this too.
>>
>> But one question: if register_node() fails, is it okay to continue, or
>> should we panic?
>>
>> What is the correct way to handle this?
>
> panic() or BUG() is not the answer :)
>
> Try to recover ...

Got it, thank you very much, David.