[PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly

Jason Wang posted 25 patches 3 years, 5 months ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Zhang Chen <chen.zhang@intel.com>, Li Zhijian <lizhijian@fujitsu.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
[PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
Posted by Jason Wang 3 years, 5 months ago
From: Zhang Chen <chen.zhang@intel.com>

When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63

So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo.c       | 9 ++++++++-
 net/trace-events | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/colo.c b/net/colo.c
index 694f3c9..6b0ff56 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
     static const uint8_t vlan[] = {0x81, 0x00};
     uint8_t *data = pkt->data + pkt->vnet_hdr_len;
     uint16_t l3_proto;
-    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+    ssize_t l2hdr_len;
+
+    if (data == NULL) {
+        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
+                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
+        return 1;
+    }
+    l2hdr_len = eth_get_l2_hdr_length(data);
 
     if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
         trace_colo_proxy_main("pkt->size < ETH_HLEN");
diff --git a/net/trace-events b/net/trace-events
index d7a1725..6af927b 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
 
 # colo.c
 colo_proxy_main(const char *chr) ": %s"
+colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
 
 # colo-compare.c
 colo_compare_main(const char *chr) ": %s"
-- 
2.7.4
Re: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
Posted by Peter Maydell 3 years, 4 months ago
On Wed, 20 Jul 2022 at 10:04, Jason Wang <jasowang@redhat.com> wrote:
>
> From: Zhang Chen <chen.zhang@intel.com>
>
> When COLO use only one vnet_hdr_support parameter between
> filter-redirector and filter-mirror(or colo-compare), COLO will crash
> with segmentation fault. Back track as follow:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>     at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
> (gdb) bt
> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>     at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
> net/colo.c:49
> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
> net/filter-rewriter.c:63
>
> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
> raise error and add trace-events to track vnet_hdr_len.
>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Hi -- Coverity points out that there's a problem with this fix
(CID 1490786):

> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
>      static const uint8_t vlan[] = {0x81, 0x00};
>      uint8_t *data = pkt->data + pkt->vnet_hdr_len;

data here is set to pkt->data + pkt->vnet_hdr_len.
If pkt->data is NULL then this addition is C undefined behaviour,
and if pkt->data is not NULL but the integer added is such
that the pointer ends up not pointing within data, then that
is also C undefined behaviour...

>      uint16_t l3_proto;
> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> +    ssize_t l2hdr_len;
> +
> +    if (data == NULL) {

...so the compiler is allowed to assume that data cannot be NULL
here, and this entire error checking clause is logically dead code.

If you're trying to check whether vnet_hdr_len is valid, you
need to do that as an integer arithmetic check before you start
using it for pointer arithmetic. And you probably want to be
checking against some kind of bound, not just for whether the
result is going to be NULL.

Overall this looks kinda sketchy and could probably use more
detailed code review. Where does the bogus vnet_hdr_len come from in
the first place? Is this data from the guest, or from the network
(ie uncontrolled)?

> +        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
> +        return 1;
> +    }

thanks
-- PMM
Re: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
Posted by Jason Wang 3 years, 4 months ago
在 2022/7/29 21:58, Peter Maydell 写道:
> On Wed, 20 Jul 2022 at 10:04, Jason Wang <jasowang@redhat.com> wrote:
>> From: Zhang Chen <chen.zhang@intel.com>
>>
>> When COLO use only one vnet_hdr_support parameter between
>> filter-redirector and filter-mirror(or colo-compare), COLO will crash
>> with segmentation fault. Back track as follow:
>>
>> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
>> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
>> (gdb) bt
>> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
>> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
>> net/colo.c:49
>> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
>> net/filter-rewriter.c:63
>>
>> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
>> raise error and add trace-events to track vnet_hdr_len.
>>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Hi -- Coverity points out that there's a problem with this fix
> (CID 1490786):
>
>> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
>>       static const uint8_t vlan[] = {0x81, 0x00};
>>       uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> data here is set to pkt->data + pkt->vnet_hdr_len.
> If pkt->data is NULL then this addition is C undefined behaviour,
> and if pkt->data is not NULL but the integer added is such
> that the pointer ends up not pointing within data, then that
> is also C undefined behaviour...


Right. And I don't see how pkt->data can be NULL, maybe a hint of bug 
elsewhere.


>
>>       uint16_t l3_proto;
>> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
>> +    ssize_t l2hdr_len;
>> +
>> +    if (data == NULL) {
> ...so the compiler is allowed to assume that data cannot be NULL
> here, and this entire error checking clause is logically dead code.
>
> If you're trying to check whether vnet_hdr_len is valid, you
> need to do that as an integer arithmetic check before you start
> using it for pointer arithmetic. And you probably want to be
> checking against some kind of bound, not just for whether the
> result is going to be NULL.


Or we can let the caller to validate the Pkt before calling this function.


>
> Overall this looks kinda sketchy and could probably use more
> detailed code review. Where does the bogus vnet_hdr_len come from in
> the first place? Is this data from the guest, or from the network
> (ie uncontrolled)?


If I understand correctly the vnet_hdr_len is set during handshake 
(net_fill_rstate()) which is sent from a network backend.

Chen or Xu, please post a fix and explain what happens in the changelog.

Thanks


>
>> +        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
>> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
>> +        return 1;
>> +    }
> thanks
> -- PMM
>


RE: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
Posted by Zhang, Chen 3 years, 4 months ago

> -----Original Message-----
> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, August 1, 2022 12:18 PM
> To: Zhang, Chen <chen.zhang@intel.com>; Xu, Tao3 <tao3.xu@intel.com>
> Cc: qemu-devel@nongnu.org; Li Zhijian <lizhijian@fujitsu.com>; Peter
> Maydell <peter.maydell@linaro.org>
> Subject: Re: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet
> is not parsed correctly
> 
> 
> 在 2022/7/29 21:58, Peter Maydell 写道:
> > On Wed, 20 Jul 2022 at 10:04, Jason Wang <jasowang@redhat.com> wrote:
> >> From: Zhang Chen <chen.zhang@intel.com>
> >>
> >> When COLO use only one vnet_hdr_support parameter between
> >> filter-redirector and filter-mirror(or colo-compare), COLO will crash
> >> with segmentation fault. Back track as follow:
> >>
> >> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation
> fault.
> >> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
> >>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> >> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
> >> (gdb) bt
> >> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
> >>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> >> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
> >> net/colo.c:49
> >> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
> >> net/filter-rewriter.c:63
> >>
> >> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
> >> raise error and add trace-events to track vnet_hdr_len.
> >>
> >> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > Hi -- Coverity points out that there's a problem with this fix (CID
> > 1490786):
> >
> >> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
> >>       static const uint8_t vlan[] = {0x81, 0x00};
> >>       uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> > data here is set to pkt->data + pkt->vnet_hdr_len.
> > If pkt->data is NULL then this addition is C undefined behaviour, and
> > if pkt->data is not NULL but the integer added is such that the
> > pointer ends up not pointing within data, then that is also C
> > undefined behaviour...
> 
> 
> Right. And I don't see how pkt->data can be NULL, maybe a hint of bug
> elsewhere.
> 
> 
> >
> >>       uint16_t l3_proto;
> >> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> >> +    ssize_t l2hdr_len;
> >> +
> >> +    if (data == NULL) {
> > ...so the compiler is allowed to assume that data cannot be NULL
> > here, and this entire error checking clause is logically dead code.
> >
> > If you're trying to check whether vnet_hdr_len is valid, you
> > need to do that as an integer arithmetic check before you start
> > using it for pointer arithmetic. And you probably want to be
> > checking against some kind of bound, not just for whether the
> > result is going to be NULL.
> 
> 
> Or we can let the caller to validate the Pkt before calling this function.
> 
> 
> >
> > Overall this looks kinda sketchy and could probably use more
> > detailed code review. Where does the bogus vnet_hdr_len come from in
> > the first place? Is this data from the guest, or from the network
> > (ie uncontrolled)?
> 
> 
> If I understand correctly the vnet_hdr_len is set during handshake
> (net_fill_rstate()) which is sent from a network backend.
> 
> Chen or Xu, please post a fix and explain what happens in the changelog.

OK, we will send a patch to fix it.

Thanks
Chen

> 
> Thanks
> 
> 
> >
> >> +        trace_colo_proxy_main_vnet_info("This packet is not parsed
> correctly, "
> >> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
> >> +        return 1;
> >> +    }
> > thanks
> > -- PMM
> >