[PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL

Javi Merino posted 1 patch 3 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/6b2db92de764b6031647926d27cb14dd455eff7d.1704809355.git.javi.merino@cloud.com
There is a newer version of this series
xen/common/dt-overlay.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
[PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Javi Merino 3 months, 3 weeks ago
In remove_nodes(), overlay_node is dereferenced when printing the
error message even though it is known to be NULLL.  Fix the error
message to avoid dereferencing a NULL pointer.

The semantic patch that spots this code is available in

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities")
Signed-off-by: Javi Merino <javi.merino@cloud.com>
---
CC: Vikram Garhwal <vikram.garhwal@amd.com>

Vikram, I didn't know what to put in the error message.  Feel free to
suggest something more appropriate than "Device not present in the
tree".

---
 xen/common/dt-overlay.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 5663a049e90a..f04b0c276eea 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -428,8 +428,7 @@ static int remove_nodes(const struct overlay_track *tracker)
         overlay_node = (struct dt_device_node *)tracker->nodes_address[j];
         if ( overlay_node == NULL )
         {
-            printk(XENLOG_ERR "Device %s is not present in the tree. Removing nodes failed\n",
-                   overlay_node->full_name);
+            printk(XENLOG_ERR "Device not present in the tree. Removing nodes failed\n");
             return -EINVAL;
         }
 
-- 
2.42.0
Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Julien Grall 3 months, 3 weeks ago
Hi Javi,

Title: Any reason this is titled for-4.18? Shouldn't this patch also be 
merged in staging?

On 09/01/2024 14:19, Javi Merino wrote:
> In remove_nodes(), overlay_node is dereferenced when printing the
> error message even though it is known to be NULLL.  Fix the error

Typo: s/NULLL/NULL/

This can be fixed on commit if there is nothing else.

> message to avoid dereferencing a NULL pointer.
> 
> The semantic patch that spots this code is available in
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0

Good catch and glad to see that coccinelle can work on Xen. I am looking 
forward for more work in that area :).

> 
> Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities")
> Signed-off-by: Javi Merino <javi.merino@cloud.com>
c> ---
> CC: Vikram Garhwal <vikram.garhwal@amd.com>
> 
> Vikram, I didn't know what to put in the error message.  Feel free to
> suggest something more appropriate than "Device not present in the
> tree".

More questions for Vikram, looking at the code, it is not 100% clear in 
which condition overlay_node could be NULL. Is this a programming error? 
if so, maybe this should be an ASSERT_UNREACHABLE() (could be added 
separately) and it would be fine to print nothing.

Cheers,

-- 
Julien Grall
Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Vikram Garhwal 3 months, 3 weeks ago
Hi Javi,
Thank you for spotting and fixing this.
On Tue, Jan 09, 2024 at 03:31:55PM +0000, Julien Grall wrote:
> Hi Javi,
> 
> Title: Any reason this is titled for-4.18? Shouldn't this patch also be
> merged in staging?
> 
> On 09/01/2024 14:19, Javi Merino wrote:
> > In remove_nodes(), overlay_node is dereferenced when printing the
> > error message even though it is known to be NULLL.  Fix the error
> 
> Typo: s/NULLL/NULL/
> 
> This can be fixed on commit if there is nothing else.
> 
> > message to avoid dereferencing a NULL pointer.
> > 
> > The semantic patch that spots this code is available in
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0
> 
> Good catch and glad to see that coccinelle can work on Xen. I am looking
> forward for more work in that area :).
> 
> > 
> > Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities")
> > Signed-off-by: Javi Merino <javi.merino@cloud.com>
> c> ---
> > CC: Vikram Garhwal <vikram.garhwal@amd.com>
> > 
> > Vikram, I didn't know what to put in the error message.  Feel free to
> > suggest something more appropriate than "Device not present in the
> > tree".
> 
> More questions for Vikram, looking at the code, it is not 100% clear in
> which condition overlay_node could be NULL. Is this a programming error? if
> so, maybe this should be an ASSERT_UNREACHABLE() (could be added separately)
> and it would be fine to print nothing.
> 
This can happen with failures in add_nodes() function. add_nodes() failure will
try to call remove_nodes function. Depending on where add_nodes() is failed,
nodes_address may or may not be NULL.

We also added a detailed comment on this:
https://github.com/xen-project/xen/blob/5a3ace21f3d779b291a2d305824b2820d88de7f1/xen/common/dt-overlay.c#L816

For now, we can return from here without printing anything as error message will
be printed by the caller of remove_nodes() anyway.

> Cheers,
> 
> -- 
> Julien Grall
Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Javi Merino 3 months, 3 weeks ago
On Wed, Jan 10, 2024 at 12:25:57PM -0800, Vikram Garhwal wrote:
> Hi Javi,
> Thank you for spotting and fixing this.

Hi Vikram,

> On Tue, Jan 09, 2024 at 03:31:55PM +0000, Julien Grall wrote:
> > On 09/01/2024 14:19, Javi Merino wrote:
> > > In remove_nodes(), overlay_node is dereferenced when printing the
> > > error message even though it is known to be NULLL.  Fix the error
> > 
> > Typo: s/NULLL/NULL/
> > 
> > This can be fixed on commit if there is nothing else.
> > 
> > > message to avoid dereferencing a NULL pointer.
> > > 
> > > The semantic patch that spots this code is available in
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0
> > 
> > Good catch and glad to see that coccinelle can work on Xen. I am looking
> > forward for more work in that area :).
> > 
> > > 
> > > Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities")
> > > Signed-off-by: Javi Merino <javi.merino@cloud.com>
> > c> ---
> > > CC: Vikram Garhwal <vikram.garhwal@amd.com>
> > > 
> > > Vikram, I didn't know what to put in the error message.  Feel free to
> > > suggest something more appropriate than "Device not present in the
> > > tree".
> > 
> > More questions for Vikram, looking at the code, it is not 100% clear in
> > which condition overlay_node could be NULL. Is this a programming error? if
> > so, maybe this should be an ASSERT_UNREACHABLE() (could be added separately)
> > and it would be fine to print nothing.
> > 
> This can happen with failures in add_nodes() function. add_nodes() failure will
> try to call remove_nodes function. Depending on where add_nodes() is failed,
> nodes_address may or may not be NULL.
> 
> We also added a detailed comment on this:
> https://github.com/xen-project/xen/blob/5a3ace21f3d779b291a2d305824b2820d88de7f1/xen/common/dt-overlay.c#L816
> 
> For now, we can return from here without printing anything as error message will
> be printed by the caller of remove_nodes() anyway.

Ok, I'll send a v2 without the printk and add this explanation to the commit message.

Thanks!
Javi
Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Javi Merino 3 months, 3 weeks ago
On Tue, Jan 09, 2024 at 03:31:55PM +0000, Julien Grall wrote:
> Hi Javi,
> 
> Title: Any reason this is titled for-4.18? Shouldn't this patch also be
> merged in staging?

This is for staging and 4.18.  If the tag "for-4.18" meant "this is
only for 4.18", then that's not what I meant.  Sorry for that.

> On 09/01/2024 14:19, Javi Merino wrote:
> > In remove_nodes(), overlay_node is dereferenced when printing the
> > error message even though it is known to be NULLL.  Fix the error
> 
> Typo: s/NULLL/NULL/
> 
> This can be fixed on commit if there is nothing else.
> 
> > message to avoid dereferencing a NULL pointer.
> > 
> > The semantic patch that spots this code is available in
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0
> 
> Good catch and glad to see that coccinelle can work on Xen. I am looking
> forward for more work in that area :).

I'm working on a series to add a "make coccicheck" to Xen, similar to
the coccicheck in the kernel.  All the other semantic patches that I
think are relevant to us only give false positives, but I think it
would still be beneficial to have them.

Cheers,
Javi
Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Julien Grall 3 months, 3 weeks ago
Hi Javi,

On 09/01/2024 15:42, Javi Merino wrote:
> On Tue, Jan 09, 2024 at 03:31:55PM +0000, Julien Grall wrote:
>> Hi Javi,
>>
>> Title: Any reason this is titled for-4.18? Shouldn't this patch also be
>> merged in staging?
> 
> This is for staging and 4.18.  If the tag "for-4.18" meant "this is
> only for 4.18", then that's not what I meant.  Sorry for that.

We usually use "for-4.XX" during code freeze to indicate whether a patch 
should be part of the upcoming relase of the next week. Hence my confusion.

Outside of the code freeze, we sometimes add the tag "Backport: 4.XX+" 
just above the Signed-off-by in addition to the Fixes tag to indicate 
how far the backport should go. The Fixes tag is also sufficient.

As a side node, this is fixing experimental code. So in general we would 
not backport such patch (we only do backport for supported features). 
This is because there are no guarantee that an experimental would not 
crash Xen.

Although, the tag is still useful for downstream that may have decided 
to take the patch in (I think AMD/Xilinx is using them) and accept some 
of the risks.

Stefano is the person doing the backport for Arm. So I will let him 
decide whether he wants to backport it.

Cheers,

-- 
Julien Grall
Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL
Posted by Javi Merino 3 months, 3 weeks ago
On Tue, Jan 09, 2024 at 03:50:38PM +0000, Julien Grall wrote:
> Hi Javi,
> 
> On 09/01/2024 15:42, Javi Merino wrote:
> > On Tue, Jan 09, 2024 at 03:31:55PM +0000, Julien Grall wrote:
> > > Hi Javi,
> > > 
> > > Title: Any reason this is titled for-4.18? Shouldn't this patch also be
> > > merged in staging?
> > 
> > This is for staging and 4.18.  If the tag "for-4.18" meant "this is
> > only for 4.18", then that's not what I meant.  Sorry for that.
> 
> We usually use "for-4.XX" during code freeze to indicate whether a patch
> should be part of the upcoming relase of the next week. Hence my confusion.

Ok, I know for next time.  Thanks for the clarification.

> Outside of the code freeze, we sometimes add the tag "Backport: 4.XX+" just
> above the Signed-off-by in addition to the Fixes tag to indicate how far the
> backport should go. The Fixes tag is also sufficient.
> 
> As a side node, this is fixing experimental code. So in general we would not
> backport such patch (we only do backport for supported features). This is
> because there are no guarantee that an experimental would not crash Xen.
> 
> Although, the tag is still useful for downstream that may have decided to
> take the patch in (I think AMD/Xilinx is using them) and accept some of the
> risks.
> 
> Stefano is the person doing the backport for Arm. So I will let him decide
> whether he wants to backport it.

Fair enough.

Cheers,
Javi