[RESEND] [RFC PATCH] xen/arm: domain_build: Ignore empty memory bank

Elliott Mitchell posted 1 patch 3 years, 3 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/X+DbupqYE3rrFaIM@mattapan.m5p.com
xen/arch/arm/domain_build.c | 5 +++++
1 file changed, 5 insertions(+)
[RESEND] [RFC PATCH] xen/arm: domain_build: Ignore empty memory bank
Posted by Elliott Mitchell 3 years, 3 months ago
Previously Xen had stopped processing Device Trees if an empty
(size == 0) memory bank was found.

Commit 5a37207df52066efefe419c677b089a654d37afc changed this behavior to
ignore such banks.  Unfortunately this means these empty nodes are
visible to code which accesses the device trees.  Have domain_build also
ignore these entries.

---
This is tagged "RFC" due to issues.

Authorship of this is unclean.  In the first version (checked in, but
never sent to the list and never compiled) a different condition was used
and the comment was absent.  When examing the code it became clear a
condition identical to
5a37207df52066efefe419c677b089a654d37afc was appropriate and so I changed
to !size.  Since what the code is doing was sufficiently similar, the
comment was grabbed.
How far does this dilute authorship?  I diagnosed the bug and figured out
where to add the lines, but the amount inspired by Julien Grall gives
Julien Grall some level of claim of authorship.  Advice is needed.

Commit 7d2b21fd36c2a47799eed71c67bae7faa1ec4272 is an outright bug for
me.  I don't know what percentage of users will experience this bug, but
being observed this quickly suggests this is major enough to be urgent
for the stable-4.14 branch.

I doubt this is the only bug exposed by
5a37207df52066efefe419c677b089a654d37afc.  This might actually effect
most uses of the device-tree code.  I think either the core needs to be
fixed to hide zero-sized entries from anything outside of
xen/common/device_tree.c, otherwise all uses of the device-tree core need
to be audited to ensure they ignore zero-sized entries.  Notably this is
the second location where zero-size device-tree entries need to be
ignored, preemptive action should be taken before a third is found by
bugreport.

Perhaps this fix is appropriate for the stable-4.14 branch and a proper
solution should be implemented for the main branch?

The error message which first showed was
"Unable to retrieve address %u for %s\n".  Where the number in %u was
0, this seems a poor error message.  Version 0.1 (which never got
compiled) had been:  if(!addr) continue;

As I thought the 0 it was reporting was an address of 0.  Perhaps the
message should instead be:
"Unable to retrieve address for index %u of %s\n"?
---
 xen/arch/arm/domain_build.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e824ba34b0..0b83384bd3 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1405,6 +1405,11 @@ static int __init handle_device(struct domain *d, struct dt_device_node *dev,
     {
         struct map_range_data mr_data = { .d = d, .p2mt = p2mt };
         res = dt_device_get_address(dev, i, &addr, &size);
+
+        /* Some DT may describe empty bank, ignore them */
+        if ( !size )
+            continue;
+
         if ( res )
         {
             printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Re: [RESEND] [RFC PATCH] xen/arm: domain_build: Ignore empty memory bank
Posted by Julien Grall 3 years, 3 months ago
Hi Elliott,

I was planning to review the first version today, but as you sent a new 
version I will answer on this one directly.

On 21/12/2020 17:30, Elliott Mitchell wrote:
> Previously Xen had stopped processing Device Trees if an empty
> (size == 0) memory bank was found.
> 
> Commit 5a37207df52066efefe419c677b089a654d37afc changed this behavior to
> ignore such banks.  Unfortunately this means these empty nodes are
> visible to code which accesses the device trees.  Have domain_build also
> ignore these entries.

I am probably missing something here. The commit you pointed out will 
only take care of memory nodes (including reserved-memory).

It should not be possible to reach handle_device() with actual RAM. 
Although, it would with the reserved memory node.

Could you provide a bit more details on the issue? In particular, I am 
interested to see the offending node and its content.

> ---
> This is tagged "RFC" due to issues.
> 
> Authorship of this is unclean.  In the first version (checked in, but
> never sent to the list and never compiled) a different condition was used
> and the comment was absent.  When examing the code it became clear a
> condition identical to
> 5a37207df52066efefe419c677b089a654d37afc was appropriate and so I changed
> to !size.  Since what the code is doing was sufficiently similar, the
> comment was grabbed.
> How far does this dilute authorship?  I diagnosed the bug and figured out
> where to add the lines, but the amount inspired by Julien Grall gives
> Julien Grall some level of claim of authorship.  Advice is needed.

You did all the investigation of the bug and the code is small enough. 
So I think it is fine for you to claim the authorship.

> 
> Commit 7d2b21fd36c2a47799eed71c67bae7faa1ec4272 is an outright bug for
> me.  I don't know what percentage of users will experience this bug, but
> being observed this quickly suggests this is major enough to be urgent
> for the stable-4.14 branch.

We usually work on fix for upstream first and then backport it.

> I doubt this is the only bug exposed by
> 5a37207df52066efefe419c677b089a654d37afc.

Are you saying that with my patch dropped, Xen will boot but with it 
will not?

> This might actually effect
> most uses of the device-tree code.  I think either the core needs to be
> fixed to hide zero-sized entries from anything outside of
> xen/common/device_tree.c, otherwise all uses of the device-tree core need
> to be audited to ensure they ignore zero-sized entries.

The meaning of zero-sized is not the same everywhere. In the case of 
memory banks, it can be safely ignored.

For other devices (e.g. GIC), hiding it may make things worse because a 
size 0 means the node is bogus.

> Notably this is
> the second location where zero-size device-tree entries need to be
> ignored, preemptive action should be taken before a third is found by
> bugreport. >
> Perhaps this fix is appropriate for the stable-4.14 branch and a proper
> solution should be implemented for the main branch?
> 
> The error message which first showed was
> "Unable to retrieve address %u for %s\n".  Where the number in %u was
> 0, this seems a poor error message.  Version 0.1 (which never got
> compiled) had been:  if(!addr) continue;

Usually, dt_get_address() will fail because it can't translate the 
address, not because the size is 0. The more...

> 
> As I thought the 0 it was reporting was an address of 0.  Perhaps the
> message should instead be:
> "Unable to retrieve address for index %u of %s\n"?
> ---
>   xen/arch/arm/domain_build.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index e824ba34b0..0b83384bd3 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1405,6 +1405,11 @@ static int __init handle_device(struct domain *d, struct dt_device_node *dev,
>       {
>           struct map_range_data mr_data = { .d = d, .p2mt = p2mt };
>           res = dt_device_get_address(dev, i, &addr, &size);
> +
> +        /* Some DT may describe empty bank, ignore them */
> +        if ( !size )
> +            continue;

... dt_device_get_address() will not set the size if the node is bogus. 
So you can't rely on either addr or size when res is non-zero.

handle_device() (at least on unstable) will not initialize the two 
variables to 0. So I guess you are lucky that you compiler zeroed them 
for you, but that's not the normal behavior.

So I think we first need to figure out what is the offending node and 
why it is dt_device_get_address() is returning an error for it.

That said, I agree that we possibly want a check size == 0 (action TBD) 
in map_range_to_domain() as the code would do the wrong thing for 0.

>           if ( res )
>           {
>               printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> 

Cheers,

-- 
Julien Grall

Re: [RESEND] [RFC PATCH] xen/arm: domain_build: Ignore empty memory bank
Posted by Elliott Mitchell 3 years, 3 months ago
On Mon, Dec 21, 2020 at 06:28:35PM +0000, Julien Grall wrote:
> I was planning to review the first version today, but as you sent a new 
> version I will answer on this one directly.

Mostly the commentary has been increasing, not so much the commit.

> On 21/12/2020 17:30, Elliott Mitchell wrote:
> > Previously Xen had stopped processing Device Trees if an empty
> > (size == 0) memory bank was found.
> > 
> > Commit 5a37207df52066efefe419c677b089a654d37afc changed this behavior to
> > ignore such banks.  Unfortunately this means these empty nodes are
> > visible to code which accesses the device trees.  Have domain_build also
> > ignore these entries.
> 
> I am probably missing something here. The commit you pointed out will 
> only take care of memory nodes (including reserved-memory).
> 
> It should not be possible to reach handle_device() with actual RAM. 
> Although, it would with the reserved memory node.
> 
> Could you provide a bit more details on the issue? In particular, I am 
> interested to see the offending node and its content.

Define "see" in this context.  The message which shows up is:
"Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0"

According to Linux "name", "reg" and "resets" exist there.

(which as stated "0" looks suspiciously like NULL, rather than an
index)

> > I doubt this is the only bug exposed by
> > 5a37207df52066efefe419c677b089a654d37afc.
> 
> Are you saying that with my patch dropped, Xen will boot but with it 
> will not?

Hmm, realizing that I'd found a fix, but not actually tested to
confirm...   Seems I had it wrong, dropping
5a37207df52066efefe419c677b089a654d37afc doesn't fix the issue, so that
is NOT the cause.

Sorry about misattributing the cause.  Now to start doing builds to
identify the cause...  (this of course will take a bit)

Hmm, this is being rather more difficult to identify than I expected.


> > As I thought the 0 it was reporting was an address of 0.  Perhaps the
> > message should instead be:
> > "Unable to retrieve address for index %u of %s\n"?
> > ---
> >   xen/arch/arm/domain_build.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index e824ba34b0..0b83384bd3 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -1405,6 +1405,11 @@ static int __init handle_device(struct domain *d, struct dt_device_node *dev,
> >       {
> >           struct map_range_data mr_data = { .d = d, .p2mt = p2mt };
> >           res = dt_device_get_address(dev, i, &addr, &size);
> > +
> > +        /* Some DT may describe empty bank, ignore them */
> > +        if ( !size )
> > +            continue;
> 
> ... dt_device_get_address() will not set the size if the node is bogus. 
> So you can't rely on either addr or size when res is non-zero.
> 
> handle_device() (at least on unstable) will not initialize the two 
> variables to 0. So I guess you are lucky that you compiler zeroed them 
> for you, but that's not the normal behavior.
> 
> So I think we first need to figure out what is the offending node and 
> why it is dt_device_get_address() is returning an error for it.
> 
> That said, I agree that we possibly want a check size == 0 (action TBD) 
> in map_range_to_domain() as the code would do the wrong thing for 0.

Seems like returning to a working build without that commit is going to
take a bit.  :-(


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445