[XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()

Javi Merino posted 1 patch 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/ad7c89bbae34155566ae7c9ca2cb501f21c7d585.1724330921.git.javi.merino@cloud.com
tools/libs/light/libxl_console.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Javi Merino 3 months ago
When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
call in main_dmesg().  ASAN reports a heap buffer overflow: an
off-by-one access to cr->buffer.

The readconsole sysctl copies up to count characters into the buffer,
but it does not add a null character at the end.  Despite the
documentation of libxl_xen_console_read_line(), line_r is not
nul-terminated if 16384 characters were copied to the buffer.

Fix this by making count one less that the size of the allocated
buffer so that the last byte is always null.

Reported-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Javi Merino <javi.merino@cloud.com>
---
 tools/libs/light/libxl_console.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_console.c b/tools/libs/light/libxl_console.c
index a563c9d3c7f9..fa28e2139453 100644
--- a/tools/libs/light/libxl_console.c
+++ b/tools/libs/light/libxl_console.c
@@ -779,7 +779,7 @@ libxl_xen_console_reader *
     cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
     cr->buffer = libxl__zalloc(NOGC, size);
     cr->size = size;
-    cr->count = size;
+    cr->count = size - 1;
     cr->clear = clear;
     cr->incremental = 1;
 
-- 
2.44.0


Re: [XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Andrew Cooper 3 months ago
On 22/08/2024 2:13 pm, Javi Merino wrote:
> When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
> call in main_dmesg().  ASAN reports a heap buffer overflow: an
> off-by-one access to cr->buffer.
>
> The readconsole sysctl copies up to count characters into the buffer,
> but it does not add a null character at the end.  Despite the
> documentation of libxl_xen_console_read_line(), line_r is not
> nul-terminated if 16384 characters were copied to the buffer.
>
> Fix this by making count one less that the size of the allocated
> buffer so that the last byte is always null.
>
> Reported-by: Edwin Török <edwin.torok@cloud.com>
> Signed-off-by: Javi Merino <javi.merino@cloud.com>
> ---
>  tools/libs/light/libxl_console.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/libs/light/libxl_console.c b/tools/libs/light/libxl_console.c
> index a563c9d3c7f9..fa28e2139453 100644
> --- a/tools/libs/light/libxl_console.c
> +++ b/tools/libs/light/libxl_console.c
> @@ -779,7 +779,7 @@ libxl_xen_console_reader *
>      cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
>      cr->buffer = libxl__zalloc(NOGC, size);
>      cr->size = size;
> -    cr->count = size;
> +    cr->count = size - 1;
>      cr->clear = clear;
>      cr->incremental = 1;
>  

This looks like it will fix the ASAN issue, but I think a better fix
would be:

-        printf("%s", line);
+       printf("%.*s", cr->count, line);

because otherwise there's a world of sharp corners once Xen has wrapped
the buffer for the first time.


Which brings us a lot of other WTFs in this code...

First, libxl_console.c describes it's functionality in terms of lines,
and line_reader() in the API.  Yet it's not lines, it's a 16k buffer
with generally multi-line content.  It's too late to fix the naming, but
we could at least rewrite the comments not to be blatant lies.


Just out of context above the hunk is:

    unsigned int size = 16384;

which isn't accurate.  The size of Xen's console ring can be changed at
compile time (like XenServer does), and/or by command line variable.

Because the dmesg ring is never full (it just keeps producing and
overwriting tail data), it's impossible to get a clean copy except in a
single hypercall; the incremental/offset parameters are an illusion, and
do not function correctly if a new printk() has occurred between
adjacent hypercalls.

And that is why setting count to size - 1 probably isn't wise.  It means
that, even in the ideal case where Xen's ringbuffer is 16k, you've still
got to make 2 hypercalls to get the content.

~Andrew

Re: [XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Javi Merino 3 months ago
On Fri, Aug 23, 2024 at 10:34:05AM GMT, Andrew Cooper wrote:
> On 22/08/2024 2:13 pm, Javi Merino wrote:
> > When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
> > call in main_dmesg().  ASAN reports a heap buffer overflow: an
> > off-by-one access to cr->buffer.
> >
> > The readconsole sysctl copies up to count characters into the buffer,
> > but it does not add a null character at the end.  Despite the
> > documentation of libxl_xen_console_read_line(), line_r is not
> > nul-terminated if 16384 characters were copied to the buffer.
> >
> > Fix this by making count one less that the size of the allocated
> > buffer so that the last byte is always null.
> >
> > Reported-by: Edwin Török <edwin.torok@cloud.com>
> > Signed-off-by: Javi Merino <javi.merino@cloud.com>
> > ---
> >  tools/libs/light/libxl_console.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/libs/light/libxl_console.c b/tools/libs/light/libxl_console.c
> > index a563c9d3c7f9..fa28e2139453 100644
> > --- a/tools/libs/light/libxl_console.c
> > +++ b/tools/libs/light/libxl_console.c
> > @@ -779,7 +779,7 @@ libxl_xen_console_reader *
> >      cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
> >      cr->buffer = libxl__zalloc(NOGC, size);
> >      cr->size = size;
> > -    cr->count = size;
> > +    cr->count = size - 1;
> >      cr->clear = clear;
> >      cr->incremental = 1;
> >  
> 
> This looks like it will fix the ASAN issue, but I think a better fix
> would be:
> 
> -        printf("%s", line);
> +       printf("%.*s", cr->count, line);
> 
> because otherwise there's a world of sharp corners once Xen has wrapped
> the buffer for the first time.
> 
> 
> Which brings us a lot of other WTFs in this code...
> 
> First, libxl_console.c describes it's functionality in terms of lines,
> and line_reader() in the API.  Yet it's not lines, it's a 16k buffer
> with generally multi-line content.  It's too late to fix the naming, but
> we could at least rewrite the comments not to be blatant lies.
> 
> 
> Just out of context above the hunk is:
> 
>     unsigned int size = 16384;
> 
> which isn't accurate.  The size of Xen's console ring can be changed at
> compile time (like XenServer does), and/or by command line variable.
> 
> Because the dmesg ring is never full (it just keeps producing and
> overwriting tail data), it's impossible to get a clean copy except in a
> single hypercall; the incremental/offset parameters are an illusion, and
> do not function correctly if a new printk() has occurred between
> adjacent hypercalls.
> 
> And that is why setting count to size - 1 probably isn't wise.  It means
> that, even in the ideal case where Xen's ringbuffer is 16k, you've still
> got to make 2 hypercalls to get the content.

You are right, having to do 2 hypercalls in the ideal case is not
good.  I'm going to get rid of cr->count, and make
libxl_xen_console_read_line() honour its documentation of returning a
nul-terminated string.  I'll just make the allocation one char bigger,
which was the fix I originally had.

Cheers,
Javi


Re: [XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Anthony PERARD 3 months ago
On Fri, Aug 23, 2024 at 10:34:05AM +0100, Andrew Cooper wrote:
> On 22/08/2024 2:13 pm, Javi Merino wrote:
> This looks like it will fix the ASAN issue, but I think a better fix
> would be:
> 
> -        printf("%s", line);
> +       printf("%.*s", cr->count, line);

nul-terminated string are safer to manipulate, and cr->count isn't
available outside of libxl, so that's a no go.

> because otherwise there's a world of sharp corners once Xen has wrapped
> the buffer for the first time.

If wrapped buffer are visible to libxl or `xl dmesg`, that a bug in xen,
in XEN_SYSCTL_readconsole. It doesn't looks like it's possible to know
when the console ring buffer wrapped.

> 
> Which brings us a lot of other WTFs in this code...
> 
> First, libxl_console.c describes it's functionality in terms of lines,
> and line_reader() in the API.  Yet it's not lines, it's a 16k buffer
> with generally multi-line content.  It's too late to fix the naming, but
> we could at least rewrite the comments not to be blatant lies.
> 
> 
> Just out of context above the hunk is:
> 
>     unsigned int size = 16384;
> 
> which isn't accurate.  The size of Xen's console ring can be changed at
> compile time (like XenServer does), and/or by command line variable.

This buffer size isn't link to Xen's console ring size, both value don't
need to match.

> Because the dmesg ring is never full (it just keeps producing and
> overwriting tail data), it's impossible to get a clean copy except in a
> single hypercall; the incremental/offset parameters are an illusion, and
> do not function correctly if a new printk() has occurred between
> adjacent hypercalls.

Well, let's hope there's no more that "ring_size - size" is been written
to the ring in between two hypercall. There's doesn't seems to be
anything that can be done to work around this with this hypercall
(beside using a huge buffer size).

I think xenconsoled can read the console ring buffer, but does so
continuously, so if `xl dmesg` have some shortcoming, it might be fine.

> And that is why setting count to size - 1 probably isn't wise.  It means
> that, even in the ideal case where Xen's ringbuffer is 16k, you've still
> got to make 2 hypercalls to get the content.

Well, I've run `xl dmesg` on one machine, it seems to have taken 9
hypercall to get everything, and there's still the first few line of the
boot available. So if the ringbuffer was indeed 16k, it would not be
ideal at all... as lots of stuff would have been overwritten.

Cheers,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Jan Beulich 3 months ago
On 22.08.2024 15:13, Javi Merino wrote:
> When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
> call in main_dmesg().  ASAN reports a heap buffer overflow: an
> off-by-one access to cr->buffer.
> 
> The readconsole sysctl copies up to count characters into the buffer,
> but it does not add a null character at the end.  Despite the
> documentation of libxl_xen_console_read_line(), line_r is not
> nul-terminated if 16384 characters were copied to the buffer.
> 
> Fix this by making count one less that the size of the allocated
> buffer so that the last byte is always null.
> 
> Reported-by: Edwin Török <edwin.torok@cloud.com>
> Signed-off-by: Javi Merino <javi.merino@cloud.com>

Perhaps wants a Fixes: tag as well?

> --- a/tools/libs/light/libxl_console.c
> +++ b/tools/libs/light/libxl_console.c
> @@ -779,7 +779,7 @@ libxl_xen_console_reader *
>      cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
>      cr->buffer = libxl__zalloc(NOGC, size);
>      cr->size = size;
> -    cr->count = size;
> +    cr->count = size - 1;
>      cr->clear = clear;
>      cr->incremental = 1;

While this looks to be addressing the issue at hand, I still wonder: Why
does the "count" field exist at all? It's certainly odd to be set right
here (when the buffer actually is empty). It's used solely in
libxl_xen_console_read_line(), so could be a local variable there.

Then, further, I wonder why struct libxl__xen_console_reader lives in
libxl_internal.h - it's used solely in libxl_console.c.

Finally - why libxl__zalloc() when libxl_xen_console_read_line() clears
the buffer anyway?

Jan

Re: [XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Anthony PERARD 3 months ago
On Fri, Aug 23, 2024 at 08:31:50AM +0200, Jan Beulich wrote:
> On 22.08.2024 15:13, Javi Merino wrote:
> > When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
> > call in main_dmesg().  ASAN reports a heap buffer overflow: an
> > off-by-one access to cr->buffer.
> > 
> > The readconsole sysctl copies up to count characters into the buffer,
> > but it does not add a null character at the end.  Despite the
> > documentation of libxl_xen_console_read_line(), line_r is not
> > nul-terminated if 16384 characters were copied to the buffer.
> > 
> > Fix this by making count one less that the size of the allocated
> > buffer so that the last byte is always null.
> > 
> > Reported-by: Edwin Török <edwin.torok@cloud.com>
> > Signed-off-by: Javi Merino <javi.merino@cloud.com>
> 
> Perhaps wants a Fixes: tag as well?

Seems to be:
Fixes: 4024bae739cc ("xl: Add subcommand 'xl dmesg'")

> > --- a/tools/libs/light/libxl_console.c
> > +++ b/tools/libs/light/libxl_console.c
> > @@ -779,7 +779,7 @@ libxl_xen_console_reader *
> >      cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
> >      cr->buffer = libxl__zalloc(NOGC, size);
> >      cr->size = size;
> > -    cr->count = size;
> > +    cr->count = size - 1;
> >      cr->clear = clear;
> >      cr->incremental = 1;
> 
> While this looks to be addressing the issue at hand, I still wonder: Why
> does the "count" field exist at all? It's certainly odd to be set right
> here (when the buffer actually is empty). It's used solely in
> libxl_xen_console_read_line(), so could be a local variable there.

It isn't only odd to have "count" field, it is also used the wrong way.
Once "cr->count" is set to 0, the next call to
libxl_xen_console_read_line() will simply return an empty NULL, even if
in the mean time more data is available to read from the string.
Also, if the last call to libxl_xen_console_read_line() was shorter than
the buffer size, none of the following call will use the full size of
the buffer. This isn't really a problem for `xl dmesg`, but could be if
we want to implement "--follow" option for example.

So Javi, could you remove that `cr->count` field and use a local
variable instead?

And a comment in the code might be useful about why there's a "-1".

But I think a better way to address the issue would be to actually set
'\0' at the end of the string after the call to xc_readconsolering(),
and remove the not very useful memset(0). That way, it will be more
explicite as to why we need "-1".

> Then, further, I wonder why struct libxl__xen_console_reader lives in
> libxl_internal.h - it's used solely in libxl_console.c.

History? It actually use to live in libxl.h, 14 yr ago.

> Finally - why libxl__zalloc() when libxl_xen_console_read_line() clears
> the buffer anyway?

Overuse of libxl__zalloc when it was use to replace the open coding that
was used to allocate "cr". While it doesn't seems necessary, I don't
think it hurt to keep it there.

Thanks,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [XEN PATCH v1] libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
Posted by Javi Merino 3 months ago
On Fri, Aug 23, 2024 at 10:14:32AM GMT, Anthony PERARD wrote:
> On Fri, Aug 23, 2024 at 08:31:50AM +0200, Jan Beulich wrote:
> > On 22.08.2024 15:13, Javi Merino wrote:
> > > When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
> > > call in main_dmesg().  ASAN reports a heap buffer overflow: an
> > > off-by-one access to cr->buffer.
> > >
> > > The readconsole sysctl copies up to count characters into the buffer,
> > > but it does not add a null character at the end.  Despite the
> > > documentation of libxl_xen_console_read_line(), line_r is not
> > > nul-terminated if 16384 characters were copied to the buffer.
> > >
> > > Fix this by making count one less that the size of the allocated
> > > buffer so that the last byte is always null.
> > >
> > > Reported-by: Edwin Török <edwin.torok@cloud.com>
> > > Signed-off-by: Javi Merino <javi.merino@cloud.com>
> >
> > Perhaps wants a Fixes: tag as well?
> 
> Seems to be:
> Fixes: 4024bae739cc ("xl: Add subcommand 'xl dmesg'")

Yes, I'll add it.

> > > --- a/tools/libs/light/libxl_console.c
> > > +++ b/tools/libs/light/libxl_console.c
> > > @@ -779,7 +779,7 @@ libxl_xen_console_reader *
> > >      cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
> > >      cr->buffer = libxl__zalloc(NOGC, size);
> > >      cr->size = size;
> > > -    cr->count = size;
> > > +    cr->count = size - 1;
> > >      cr->clear = clear;
> > >      cr->incremental = 1;
> >
> > While this looks to be addressing the issue at hand, I still wonder: Why
> > does the "count" field exist at all? It's certainly odd to be set right
> > here (when the buffer actually is empty). It's used solely in
> > libxl_xen_console_read_line(), so could be a local variable there.

True.  I just wanted to fix the actual bug, but I guess while we are
at it we can improve this piece of code.

> It isn't only odd to have "count" field, it is also used the wrong way.
> Once "cr->count" is set to 0, the next call to
> libxl_xen_console_read_line() will simply return an empty NULL, even if
> in the mean time more data is available to read from the string.
> Also, if the last call to libxl_xen_console_read_line() was shorter than
> the buffer size, none of the following call will use the full size of
> the buffer. This isn't really a problem for `xl dmesg`, but could be if
> we want to implement "--follow" option for example.
> 
> So Javi, could you remove that `cr->count` field and use a local
> variable instead?

Sure, I'll do that.

> And a comment in the code might be useful about why there's a "-1".
> 
> But I think a better way to address the issue would be to actually set
> '\0' at the end of the string after the call to xc_readconsolering(),
> and remove the not very useful memset(0). That way, it will be more
> explicite as to why we need "-1".

Right, I will get rid of the memset(0) and just nul-terminate the string.

> > Then, further, I wonder why struct libxl__xen_console_reader lives in
> > libxl_internal.h - it's used solely in libxl_console.c.
> 
> History? It actually use to live in libxl.h, 14 yr ago.
>
> > Finally - why libxl__zalloc() when libxl_xen_console_read_line() clears
> > the buffer anyway?
> 
> Overuse of libxl__zalloc when it was use to replace the open coding that
> was used to allocate "cr". While it doesn't seems necessary, I don't
> think it hurt to keep it there.

It used to be a malloc() that was changed to a zalloc() in
39eaabdf4131 (libxl: don't leak buf in libxl_xen_console_read_start
error handling, 2013-12-03).  The comment in the commit message is
wrong, the malloc+memset was being done for libxl_xen_console_reader
(the struct) not for the buffer.

It doesn't hurt to keep it, but if I'm making changes in this area, I
may as well just make it a libxl__malloc() .

Cheers,
Javi