s390x/tcg: STFLE fixes

[Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by David Hildenbrand 6 years, 8 months ago

The PoP (z14, 7-382) says:
    Doublewords to the right of the doubleword in which the
    highest-numbered facility bit is assigned for a model
    may or may not be stored.

However, stack protection in certain binaries can't deal with that.
"gzip" example code:

f1b4:       a7 08 00 03             lhi     %r0,3
f1b8:       b2 b0 f0 a0             stfle   160(%r15)
f1bc:       e3 20 f0 b2 00 90       llgc    %r2,178(%r15)
f1c2:       c0 2b 00 00 00 01       nilf    %r2,1
f1c8:       b2 4f 00 10             ear     %r1,%a0
f1cc:       b9 14 00 22             lgfr    %r2,%r2
f1d0:       eb 11 00 20 00 0d       sllg    %r1,%r1,32
f1d6:       b2 4f 00 11             ear     %r1,%a1
f1da:       d5 07 f0 b8 10 28       clc     184(8,%r15),40(%r1)
f1e0:       a7 74 00 06             jne     f1ec <file_read@@Base+0x1bc>
f1e4:       eb ef f1 30 00 04       lmg     %r14,%r15,304(%r15)
f1ea:       07 fe                   br      %r14
f1ec:       c0 e5 ff ff 9d 6e       brasl   %r14,2cc8 <__stack_chk_fail@plt>

In QEMU, we currently have:
    max_bytes = 24
the code asks for (3 + 1) doublewords == 32 bytes.

If we write 32 bytes instead of only 24, and return "2 + 1" doublewords
("one less than the number of doulewords needed to contain all of the
 facility bits"), the example code detects a stack corruption.

In my opinion, the code is wrong. However, it seems to work fine on
real machines. So let's limit storing to the minimum of the requested
and the maximum doublewords.

Cc: Stefan Liebler <stli@linux.ibm.com>
Cc: Andreas Krebbel <Andreas.Krebbel@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/misc_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
index 34476134a4..b561c5781b 100644
--- a/target/s390x/misc_helper.c
+++ b/target/s390x/misc_helper.c
@@ -678,7 +678,7 @@ uint32_t HELPER(stfle)(CPUS390XState *env, uint64_t addr)
 
     prepare_stfl();
     max_bytes = ROUND_UP(used_stfl_bytes, 8);
-    for (i = 0; i < count_bytes; ++i) {
+    for (i = 0; i < MIN(count_bytes, max_bytes); ++i) {
         cpu_stb_data_ra(env, addr + i, stfl_bytes[i], ra);
     }
 
-- 
2.20.1

Re: [Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by Richard Henderson 6 years, 8 months ago

On 5/31/19 9:56 AM, David Hildenbrand wrote:
> @@ -678,7 +678,7 @@ uint32_t HELPER(stfle)(CPUS390XState *env, uint64_t addr)
>  
>      prepare_stfl();
>      max_bytes = ROUND_UP(used_stfl_bytes, 8);
> -    for (i = 0; i < count_bytes; ++i) {
> +    for (i = 0; i < MIN(count_bytes, max_bytes); ++i) {

Before the loop, please put something like

    /*
     * The PoP says that doublewords beyond the highest-numbered
     * facility bit may or may not be stored.  However, existing
     * hardware appears to not store the words, and existing
     * software appears to depend on that.
     */

with that,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

Re: [Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by David Hildenbrand 6 years, 8 months ago

On 31.05.19 17:05, Richard Henderson wrote:
> On 5/31/19 9:56 AM, David Hildenbrand wrote:
>> @@ -678,7 +678,7 @@ uint32_t HELPER(stfle)(CPUS390XState *env, uint64_t addr)
>>  
>>      prepare_stfl();
>>      max_bytes = ROUND_UP(used_stfl_bytes, 8);
>> -    for (i = 0; i < count_bytes; ++i) {
>> +    for (i = 0; i < MIN(count_bytes, max_bytes); ++i) {
> 
> Before the loop, please put something like
> 
>     /*
>      * The PoP says that doublewords beyond the highest-numbered
>      * facility bit may or may not be stored.  However, existing
>      * hardware appears to not store the words, and existing
>      * software appears to depend on that.
>      */
> 
> with that,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 

Makes sense, thanks for the ultra-fast review!

(have a great weekend!)


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by Stefan Liebler 6 years, 8 months ago

On 5/31/19 4:56 PM, David Hildenbrand wrote:
> The PoP (z14, 7-382) says:
>      Doublewords to the right of the doubleword in which the
>      highest-numbered facility bit is assigned for a model
>      may or may not be stored.
> 
> However, stack protection in certain binaries can't deal with that.
> "gzip" example code:
> 
> f1b4:       a7 08 00 03             lhi     %r0,3
> f1b8:       b2 b0 f0 a0             stfle   160(%r15)
> f1bc:       e3 20 f0 b2 00 90       llgc    %r2,178(%r15)
> f1c2:       c0 2b 00 00 00 01       nilf    %r2,1
> f1c8:       b2 4f 00 10             ear     %r1,%a0
> f1cc:       b9 14 00 22             lgfr    %r2,%r2
> f1d0:       eb 11 00 20 00 0d       sllg    %r1,%r1,32
> f1d6:       b2 4f 00 11             ear     %r1,%a1
> f1da:       d5 07 f0 b8 10 28       clc     184(8,%r15),40(%r1)
> f1e0:       a7 74 00 06             jne     f1ec <file_read@@Base+0x1bc>
> f1e4:       eb ef f1 30 00 04       lmg     %r14,%r15,304(%r15)
> f1ea:       07 fe                   br      %r14
> f1ec:       c0 e5 ff ff 9d 6e       brasl   %r14,2cc8 <__stack_chk_fail@plt>
> 
> In QEMU, we currently have:
>      max_bytes = 24
> the code asks for (3 + 1) doublewords == 32 bytes.
> 
> If we write 32 bytes instead of only 24, and return "2 + 1" doublewords
> ("one less than the number of doulewords needed to contain all of the
>   facility bits"), the example code detects a stack corruption.
> 
> In my opinion, the code is wrong. However, it seems to work fine on
> real machines. So let's limit storing to the minimum of the requested
> and the maximum doublewords.
Hi David,

Thanks for catching this. I've reported the "gzip" example to Ilya and 
indeed, r0 is setup too large. He will fix it in gzip.

You've mentioned, that this is detected in certain binaries.
Can you please share those occurrences.
Perhaps on future machines with further facility bits, stfle will write 
beyond the provided doubleword-array on a real machine or in qemu.

Bye
Stefan

> 
> Cc: Stefan Liebler <stli@linux.ibm.com>
> Cc: Andreas Krebbel <Andreas.Krebbel@de.ibm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   target/s390x/misc_helper.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
> index 34476134a4..b561c5781b 100644
> --- a/target/s390x/misc_helper.c
> +++ b/target/s390x/misc_helper.c
> @@ -678,7 +678,7 @@ uint32_t HELPER(stfle)(CPUS390XState *env, uint64_t addr)
>   
>       prepare_stfl();
>       max_bytes = ROUND_UP(used_stfl_bytes, 8);
> -    for (i = 0; i < count_bytes; ++i) {
> +    for (i = 0; i < MIN(count_bytes, max_bytes); ++i) {
>           cpu_stb_data_ra(env, addr + i, stfl_bytes[i], ra);
>       }
>   
>

Re: [Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by David Hildenbrand 6 years, 8 months ago

On 03.06.19 12:38, Stefan Liebler wrote:
> On 5/31/19 4:56 PM, David Hildenbrand wrote:
>> The PoP (z14, 7-382) says:
>>      Doublewords to the right of the doubleword in which the
>>      highest-numbered facility bit is assigned for a model
>>      may or may not be stored.
>>
>> However, stack protection in certain binaries can't deal with that.
>> "gzip" example code:
>>
>> f1b4:       a7 08 00 03             lhi     %r0,3
>> f1b8:       b2 b0 f0 a0             stfle   160(%r15)
>> f1bc:       e3 20 f0 b2 00 90       llgc    %r2,178(%r15)
>> f1c2:       c0 2b 00 00 00 01       nilf    %r2,1
>> f1c8:       b2 4f 00 10             ear     %r1,%a0
>> f1cc:       b9 14 00 22             lgfr    %r2,%r2
>> f1d0:       eb 11 00 20 00 0d       sllg    %r1,%r1,32
>> f1d6:       b2 4f 00 11             ear     %r1,%a1
>> f1da:       d5 07 f0 b8 10 28       clc     184(8,%r15),40(%r1)
>> f1e0:       a7 74 00 06             jne     f1ec <file_read@@Base+0x1bc>
>> f1e4:       eb ef f1 30 00 04       lmg     %r14,%r15,304(%r15)
>> f1ea:       07 fe                   br      %r14
>> f1ec:       c0 e5 ff ff 9d 6e       brasl   %r14,2cc8 <__stack_chk_fail@plt>
>>
>> In QEMU, we currently have:
>>      max_bytes = 24
>> the code asks for (3 + 1) doublewords == 32 bytes.
>>
>> If we write 32 bytes instead of only 24, and return "2 + 1" doublewords
>> ("one less than the number of doulewords needed to contain all of the
>>   facility bits"), the example code detects a stack corruption.
>>
>> In my opinion, the code is wrong. However, it seems to work fine on
>> real machines. So let's limit storing to the minimum of the requested
>> and the maximum doublewords.
> Hi David,
> 
> Thanks for catching this. I've reported the "gzip" example to Ilya and 
> indeed, r0 is setup too large. He will fix it in gzip.
> 
> You've mentioned, that this is detected in certain binaries.
> Can you please share those occurrences.

Hi Stafan,

thanks for your reply.

I didn't track all occurrences, it *could* be that it was only gzip in
the background making other processes fail.

For example, the systemd "vitual console setup" unit failed, too, which
was fixed by this change.

I also remember, seeing segfaults in rpmbuild, for example. They only
"changed" with this fix - I m still chasing different errors. :)

You mentioned "He will fix it in gzip", so I assume this is a gzip issue
and not a gcc/glibc/whatever toolchain issue?

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by Stefan Liebler 6 years, 8 months ago

On 6/3/19 12:45 PM, David Hildenbrand wrote:
> On 03.06.19 12:38, Stefan Liebler wrote:
>> On 5/31/19 4:56 PM, David Hildenbrand wrote:
>>> The PoP (z14, 7-382) says:
>>>       Doublewords to the right of the doubleword in which the
>>>       highest-numbered facility bit is assigned for a model
>>>       may or may not be stored.
>>>
>>> However, stack protection in certain binaries can't deal with that.
>>> "gzip" example code:
>>>
>>> f1b4:       a7 08 00 03             lhi     %r0,3
>>> f1b8:       b2 b0 f0 a0             stfle   160(%r15)
>>> f1bc:       e3 20 f0 b2 00 90       llgc    %r2,178(%r15)
>>> f1c2:       c0 2b 00 00 00 01       nilf    %r2,1
>>> f1c8:       b2 4f 00 10             ear     %r1,%a0
>>> f1cc:       b9 14 00 22             lgfr    %r2,%r2
>>> f1d0:       eb 11 00 20 00 0d       sllg    %r1,%r1,32
>>> f1d6:       b2 4f 00 11             ear     %r1,%a1
>>> f1da:       d5 07 f0 b8 10 28       clc     184(8,%r15),40(%r1)
>>> f1e0:       a7 74 00 06             jne     f1ec <file_read@@Base+0x1bc>
>>> f1e4:       eb ef f1 30 00 04       lmg     %r14,%r15,304(%r15)
>>> f1ea:       07 fe                   br      %r14
>>> f1ec:       c0 e5 ff ff 9d 6e       brasl   %r14,2cc8 <__stack_chk_fail@plt>
>>>
>>> In QEMU, we currently have:
>>>       max_bytes = 24
>>> the code asks for (3 + 1) doublewords == 32 bytes.
>>>
>>> If we write 32 bytes instead of only 24, and return "2 + 1" doublewords
>>> ("one less than the number of doulewords needed to contain all of the
>>>    facility bits"), the example code detects a stack corruption.
>>>
>>> In my opinion, the code is wrong. However, it seems to work fine on
>>> real machines. So let's limit storing to the minimum of the requested
>>> and the maximum doublewords.
>> Hi David,
>>
>> Thanks for catching this. I've reported the "gzip" example to Ilya and
>> indeed, r0 is setup too large. He will fix it in gzip.
>>
>> You've mentioned, that this is detected in certain binaries.
>> Can you please share those occurrences.
> 
> Hi Stafan,
> 
> thanks for your reply.
> 
> I didn't track all occurrences, it *could* be that it was only gzip in
> the background making other processes fail.
> 
> For example, the systemd "vitual console setup" unit failed, too, which
> was fixed by this change.
At least "objdump -d /usr/lib/systemd/systemd-vconsole-setup" does not 
contain the stfle instruction, but "ldd 
/usr/lib/systemd/systemd-vconsole-setup" is showing libz.so which could 
also contain Ilya's patches with the stfle instruction (I assume there 
is the same bug as in gzip). But I have no idea if libz is really called.
> 
> I also remember, seeing segfaults in rpmbuild, for example. They only
> "changed" with this fix - I m still chasing different errors. :)
The same applies for rpmbuild.
> 
> You mentioned "He will fix it in gzip", so I assume this is a gzip issue
> and not a gcc/glibc/whatever toolchain issue?
> 
Yes, this is a gzip bug. r0 was initialized with:
(sizeof(array-on-stack) / 8)
instead of:
(sizeof(array-on-stack) / 8) - 1

Ilya will fix it in gzip and zlib.
@Ilya: Please correct me if I'm wrong.

Re: [Qemu-devel] [PATCH v1 2/2] s390x/tcg: Store only the necessary amount of doublewords for STFLE

Posted by David Hildenbrand 6 years, 8 months ago

On 03.06.19 16:39, Stefan Liebler wrote:
> On 6/3/19 12:45 PM, David Hildenbrand wrote:
>> On 03.06.19 12:38, Stefan Liebler wrote:
>>> On 5/31/19 4:56 PM, David Hildenbrand wrote:
>>>> The PoP (z14, 7-382) says:
>>>>       Doublewords to the right of the doubleword in which the
>>>>       highest-numbered facility bit is assigned for a model
>>>>       may or may not be stored.
>>>>
>>>> However, stack protection in certain binaries can't deal with that.
>>>> "gzip" example code:
>>>>
>>>> f1b4:       a7 08 00 03             lhi     %r0,3
>>>> f1b8:       b2 b0 f0 a0             stfle   160(%r15)
>>>> f1bc:       e3 20 f0 b2 00 90       llgc    %r2,178(%r15)
>>>> f1c2:       c0 2b 00 00 00 01       nilf    %r2,1
>>>> f1c8:       b2 4f 00 10             ear     %r1,%a0
>>>> f1cc:       b9 14 00 22             lgfr    %r2,%r2
>>>> f1d0:       eb 11 00 20 00 0d       sllg    %r1,%r1,32
>>>> f1d6:       b2 4f 00 11             ear     %r1,%a1
>>>> f1da:       d5 07 f0 b8 10 28       clc     184(8,%r15),40(%r1)
>>>> f1e0:       a7 74 00 06             jne     f1ec <file_read@@Base+0x1bc>
>>>> f1e4:       eb ef f1 30 00 04       lmg     %r14,%r15,304(%r15)
>>>> f1ea:       07 fe                   br      %r14
>>>> f1ec:       c0 e5 ff ff 9d 6e       brasl   %r14,2cc8 <__stack_chk_fail@plt>
>>>>
>>>> In QEMU, we currently have:
>>>>       max_bytes = 24
>>>> the code asks for (3 + 1) doublewords == 32 bytes.
>>>>
>>>> If we write 32 bytes instead of only 24, and return "2 + 1" doublewords
>>>> ("one less than the number of doulewords needed to contain all of the
>>>>    facility bits"), the example code detects a stack corruption.
>>>>
>>>> In my opinion, the code is wrong. However, it seems to work fine on
>>>> real machines. So let's limit storing to the minimum of the requested
>>>> and the maximum doublewords.
>>> Hi David,
>>>
>>> Thanks for catching this. I've reported the "gzip" example to Ilya and
>>> indeed, r0 is setup too large. He will fix it in gzip.
>>>
>>> You've mentioned, that this is detected in certain binaries.
>>> Can you please share those occurrences.
>>
>> Hi Stafan,
>>
>> thanks for your reply.
>>
>> I didn't track all occurrences, it *could* be that it was only gzip in
>> the background making other processes fail.
>>
>> For example, the systemd "vitual console setup" unit failed, too, which
>> was fixed by this change.
> At least "objdump -d /usr/lib/systemd/systemd-vconsole-setup" does not 
> contain the stfle instruction, but "ldd 
> /usr/lib/systemd/systemd-vconsole-setup" is showing libz.so which could 
> also contain Ilya's patches with the stfle instruction (I assume there 
> is the same bug as in gzip). But I have no idea if libz is really called.
>>
>> I also remember, seeing segfaults in rpmbuild, for example. They only
>> "changed" with this fix - I m still chasing different errors. :)
> The same applies for rpmbuild.
>>
>> You mentioned "He will fix it in gzip", so I assume this is a gzip issue
>> and not a gcc/glibc/whatever toolchain issue?
>>
> Yes, this is a gzip bug. r0 was initialized with:
> (sizeof(array-on-stack) / 8)
> instead of:
> (sizeof(array-on-stack) / 8) - 1
> 
> Ilya will fix it in gzip and zlib.
> @Ilya: Please correct me if I'm wrong.
> 

Makes sense, thanks for the explanation!

-- 

Thanks,

David / dhildenb