accel/tcg/translate-all.c | 1 + target/s390x/translate.c | 16 +++++++++++----- 2 files changed, 12 insertions(+), 5 deletions(-)
Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What
happens is:
* uretprobe maps a userspace page containing an invalid instruction.
* uretprobe replaces the target function's return address with the
address of that page.
* When tb_gen_code() is called on that page, tb->size ends up being 0
(because the page starts with the invalid instruction), which causes
virt_page2 to point to the previous page.
* The previous page is not mapped, so this causes a spurious
translation exception.
The bug is that tb->size must never be 0: even if there is an illegal
instruction, the instruction bytes that have been looked at must count
towards tb->size. So adjust s390x's translate_one() to act this way
for both illegal instructions and instructions that are known to
generate exceptions.
Also add an assertion to tb_gen_code() in order to detect such
situations in future.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
v1: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html
v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size == 0
in tb_gen_code().
accel/tcg/translate-all.c | 1 +
target/s390x/translate.c | 16 +++++++++++-----
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ba6ab09790..93b2dae112 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1913,6 +1913,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
tcg_ctx->cpu = env_cpu(env);
gen_intermediate_code(cpu, tb, max_insns);
+ assert(tb->size != 0);
tcg_ctx->cpu = NULL;
max_insns = tb->icount;
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 4f953ddfba..e243624d2a 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -6412,7 +6412,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
qemu_log_mask(LOG_UNIMP, "unimplemented opcode 0x%02x%02x\n",
s->fields.op, s->fields.op2);
gen_illegal_opcode(s);
- return DISAS_NORETURN;
+ ret = DISAS_NORETURN;
+ goto out;
}
#ifndef CONFIG_USER_ONLY
@@ -6428,7 +6429,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
/* privileged instruction */
if ((s->base.tb->flags & FLAG_MASK_PSTATE) && (insn->flags & IF_PRIV)) {
gen_program_exception(s, PGM_PRIVILEGED);
- return DISAS_NORETURN;
+ ret = DISAS_NORETURN;
+ goto out;
}
/* if AFP is not enabled, instructions and registers are forbidden */
@@ -6455,7 +6457,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
}
if (dxc) {
gen_data_exception(dxc);
- return DISAS_NORETURN;
+ ret = DISAS_NORETURN;
+ goto out;
}
}
@@ -6463,7 +6466,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
if (insn->flags & IF_VEC) {
if (!((s->base.tb->flags & FLAG_MASK_VECTOR))) {
gen_data_exception(0xfe);
- return DISAS_NORETURN;
+ ret = DISAS_NORETURN;
+ goto out;
}
}
@@ -6484,7 +6488,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
(insn->spec & SPEC_r1_f128 && !is_fp_pair(get_field(s, r1))) ||
(insn->spec & SPEC_r2_f128 && !is_fp_pair(get_field(s, r2)))) {
gen_program_exception(s, PGM_SPECIFICATION);
- return DISAS_NORETURN;
+ ret = DISAS_NORETURN;
+ goto out;
}
}
@@ -6544,6 +6549,7 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
}
#endif
+out:
/* Advance to the next instruction. */
s->base.pc_next = s->pc_tmp;
return ret;
--
2.29.2
On 4/13/21 9:52 AM, Ilya Leoshkevich wrote: > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > happens is: > > * uretprobe maps a userspace page containing an invalid instruction. > * uretprobe replaces the target function's return address with the > address of that page. > * When tb_gen_code() is called on that page, tb->size ends up being 0 > (because the page starts with the invalid instruction), which causes > virt_page2 to point to the previous page. > * The previous page is not mapped, so this causes a spurious > translation exception. > > The bug is that tb->size must never be 0: even if there is an illegal > instruction, the instruction bytes that have been looked at must count > towards tb->size. So adjust s390x's translate_one() to act this way > for both illegal instructions and instructions that are known to > generate exceptions. > > Also add an assertion to tb_gen_code() in order to detect such > situations in future. > > Signed-off-by: Ilya Leoshkevich<iii@linux.ibm.com> > --- Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 13.04.21 18:52, Ilya Leoshkevich wrote: > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > happens is: > > * uretprobe maps a userspace page containing an invalid instruction. > * uretprobe replaces the target function's return address with the > address of that page. > * When tb_gen_code() is called on that page, tb->size ends up being 0 > (because the page starts with the invalid instruction), which causes > virt_page2 to point to the previous page. > * The previous page is not mapped, so this causes a spurious > translation exception. > > The bug is that tb->size must never be 0: even if there is an illegal > instruction, the instruction bytes that have been looked at must count > towards tb->size. So adjust s390x's translate_one() to act this way > for both illegal instructions and instructions that are known to > generate exceptions. > > Also add an assertion to tb_gen_code() in order to detect such > situations in future. > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > > v1: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html > v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size == 0 > in tb_gen_code(). > > accel/tcg/translate-all.c | 1 + > target/s390x/translate.c | 16 +++++++++++----- > 2 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > index ba6ab09790..93b2dae112 100644 > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -1913,6 +1913,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, > > tcg_ctx->cpu = env_cpu(env); > gen_intermediate_code(cpu, tb, max_insns); > + assert(tb->size != 0); > tcg_ctx->cpu = NULL; > max_insns = tb->icount; > > diff --git a/target/s390x/translate.c b/target/s390x/translate.c > index 4f953ddfba..e243624d2a 100644 > --- a/target/s390x/translate.c > +++ b/target/s390x/translate.c > @@ -6412,7 +6412,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) > qemu_log_mask(LOG_UNIMP, "unimplemented opcode 0x%02x%02x\n", > s->fields.op, s->fields.op2); > gen_illegal_opcode(s); > - return DISAS_NORETURN; > + ret = DISAS_NORETURN; > + goto out; > } > > #ifndef CONFIG_USER_ONLY > @@ -6428,7 +6429,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) > /* privileged instruction */ > if ((s->base.tb->flags & FLAG_MASK_PSTATE) && (insn->flags & IF_PRIV)) { > gen_program_exception(s, PGM_PRIVILEGED); > - return DISAS_NORETURN; > + ret = DISAS_NORETURN; > + goto out; > } > > /* if AFP is not enabled, instructions and registers are forbidden */ > @@ -6455,7 +6457,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) > } > if (dxc) { > gen_data_exception(dxc); > - return DISAS_NORETURN; > + ret = DISAS_NORETURN; > + goto out; > } > } > > @@ -6463,7 +6466,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) > if (insn->flags & IF_VEC) { > if (!((s->base.tb->flags & FLAG_MASK_VECTOR))) { > gen_data_exception(0xfe); > - return DISAS_NORETURN; > + ret = DISAS_NORETURN; > + goto out; > } > } > > @@ -6484,7 +6488,8 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) > (insn->spec & SPEC_r1_f128 && !is_fp_pair(get_field(s, r1))) || > (insn->spec & SPEC_r2_f128 && !is_fp_pair(get_field(s, r2)))) { > gen_program_exception(s, PGM_SPECIFICATION); > - return DISAS_NORETURN; > + ret = DISAS_NORETURN; > + goto out; > } > } > > @@ -6544,6 +6549,7 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) > } > #endif > > +out: > /* Advance to the next instruction. */ > s->base.pc_next = s->pc_tmp; > return ret; > Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb
On Tue, 13 Apr 2021 18:52:57 +0200 Ilya Leoshkevich <iii@linux.ibm.com> wrote: > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > happens is: > > * uretprobe maps a userspace page containing an invalid instruction. > * uretprobe replaces the target function's return address with the > address of that page. > * When tb_gen_code() is called on that page, tb->size ends up being 0 > (because the page starts with the invalid instruction), which causes > virt_page2 to point to the previous page. > * The previous page is not mapped, so this causes a spurious > translation exception. > > The bug is that tb->size must never be 0: even if there is an illegal > instruction, the instruction bytes that have been looked at must count > towards tb->size. So adjust s390x's translate_one() to act this way > for both illegal instructions and instructions that are known to > generate exceptions. > > Also add an assertion to tb_gen_code() in order to detect such > situations in future. > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > > v1: https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html > v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size == 0 > in tb_gen_code(). > > accel/tcg/translate-all.c | 1 + > target/s390x/translate.c | 16 +++++++++++----- > 2 files changed, 12 insertions(+), 5 deletions(-) I assume this bug is not usually hit during normal usage, right? It's probably not release critical, so I'll line it up for 6.1 instead.
On Wed, 2021-04-14 at 10:38 +0200, Cornelia Huck wrote: > On Tue, 13 Apr 2021 18:52:57 +0200 > Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > > happens is: > > > > * uretprobe maps a userspace page containing an invalid > > instruction. > > * uretprobe replaces the target function's return address with the > > address of that page. > > * When tb_gen_code() is called on that page, tb->size ends up being > > 0 > > (because the page starts with the invalid instruction), which > > causes > > virt_page2 to point to the previous page. > > * The previous page is not mapped, so this causes a spurious > > translation exception. > > > > The bug is that tb->size must never be 0: even if there is an > > illegal > > instruction, the instruction bytes that have been looked at must > > count > > towards tb->size. So adjust s390x's translate_one() to act this way > > for both illegal instructions and instructions that are known to > > generate exceptions. > > > > Also add an assertion to tb_gen_code() in order to detect such > > situations in future. > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > --- > > > > v1: > > https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html > > v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size > > == 0 > > in tb_gen_code(). > > > > accel/tcg/translate-all.c | 1 + > > target/s390x/translate.c | 16 +++++++++++----- > > 2 files changed, 12 insertions(+), 5 deletions(-) > > I assume this bug is not usually hit during normal usage, right? It's > probably not release critical, so I'll line it up for 6.1 instead. Yes, I saw it only with uprobes, and then it leads only to a process crash, not to a kernel crash. Thanks!
On Wed, 2021-04-14 at 11:19 +0200, Ilya Leoshkevich wrote: > On Wed, 2021-04-14 at 10:38 +0200, Cornelia Huck wrote: > > On Tue, 13 Apr 2021 18:52:57 +0200 > > Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > > > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > > > happens is: > > > > > > * uretprobe maps a userspace page containing an invalid > > > instruction. > > > * uretprobe replaces the target function's return address with the > > > address of that page. > > > * When tb_gen_code() is called on that page, tb->size ends up being > > > 0 > > > (because the page starts with the invalid instruction), which > > > causes > > > virt_page2 to point to the previous page. > > > * The previous page is not mapped, so this causes a spurious > > > translation exception. > > > > > > The bug is that tb->size must never be 0: even if there is an > > > illegal > > > instruction, the instruction bytes that have been looked at must > > > count > > > towards tb->size. So adjust s390x's translate_one() to act this way > > > for both illegal instructions and instructions that are known to > > > generate exceptions. > > > > > > Also add an assertion to tb_gen_code() in order to detect such > > > situations in future. > > > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > > --- > > > > > > v1: > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html > > > v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size > > > == 0 > > > in tb_gen_code(). > > > > > > accel/tcg/translate-all.c | 1 + > > > target/s390x/translate.c | 16 +++++++++++----- > > > 2 files changed, 12 insertions(+), 5 deletions(-) > > > > I assume this bug is not usually hit during normal usage, right? It's > > probably not release critical, so I'll line it up for 6.1 instead. > > Yes, I saw it only with uprobes, and then it leads only to a process > crash, not to a kernel crash. Thanks! Seems like the new assertion triggers on ARM: https://gitlab.com/cohuck/qemu/-/jobs/1178409450 What are the rules in s390x-next-staging, can we amend the patch, or only commit a follow-up? In either case, I think we'll need something like this (untested): --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -9060,6 +9060,7 @@ static void arm_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu) unsigned int insn; if (arm_pre_translate_insn(dc)) { + dc->base.pc_next += 4; return; } I'm currently trying to debug this in more detail and test the fix.
On Wed, 14 Apr 2021 12:27:03 +0200 Ilya Leoshkevich <iii@linux.ibm.com> wrote: > On Wed, 2021-04-14 at 11:19 +0200, Ilya Leoshkevich wrote: > > On Wed, 2021-04-14 at 10:38 +0200, Cornelia Huck wrote: > > > On Tue, 13 Apr 2021 18:52:57 +0200 > > > Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > > > > > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > > > > happens is: > > > > > > > > * uretprobe maps a userspace page containing an invalid > > > > instruction. > > > > * uretprobe replaces the target function's return address with the > > > > address of that page. > > > > * When tb_gen_code() is called on that page, tb->size ends up being > > > > 0 > > > > (because the page starts with the invalid instruction), which > > > > causes > > > > virt_page2 to point to the previous page. > > > > * The previous page is not mapped, so this causes a spurious > > > > translation exception. > > > > > > > > The bug is that tb->size must never be 0: even if there is an > > > > illegal > > > > instruction, the instruction bytes that have been looked at must > > > > count > > > > towards tb->size. So adjust s390x's translate_one() to act this way > > > > for both illegal instructions and instructions that are known to > > > > generate exceptions. > > > > > > > > Also add an assertion to tb_gen_code() in order to detect such > > > > situations in future. > > > > > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > > > --- > > > > > > > > v1: > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html > > > > v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size > > > > == 0 > > > > in tb_gen_code(). > > > > > > > > accel/tcg/translate-all.c | 1 + > > > > target/s390x/translate.c | 16 +++++++++++----- > > > > 2 files changed, 12 insertions(+), 5 deletions(-) > > > > > > I assume this bug is not usually hit during normal usage, right? It's > > > probably not release critical, so I'll line it up for 6.1 instead. > > > > Yes, I saw it only with uprobes, and then it leads only to a process > > crash, not to a kernel crash. Thanks! > > Seems like the new assertion triggers on ARM: > > https://gitlab.com/cohuck/qemu/-/jobs/1178409450 Yep, I just wanted to make sure it was this patch before complaining :) > > What are the rules in s390x-next-staging, can we amend the patch, or > only commit a follow-up? -staging is before I merge properly, so no problem folding something in. > In either case, I think we'll need something > like this (untested): > > --- a/target/arm/translate.c > +++ b/target/arm/translate.c > @@ -9060,6 +9060,7 @@ static void > arm_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu) > unsigned int insn; > > if (arm_pre_translate_insn(dc)) { > + dc->base.pc_next += 4; > return; > } > > > I'm currently trying to debug this in more detail and test the fix. >
On Wed, 14 Apr 2021 12:39:36 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Wed, 14 Apr 2021 12:27:03 +0200 > Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > On Wed, 2021-04-14 at 11:19 +0200, Ilya Leoshkevich wrote: > > > On Wed, 2021-04-14 at 10:38 +0200, Cornelia Huck wrote: > > > > On Tue, 13 Apr 2021 18:52:57 +0200 > > > > Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > > > > > > > Hitting an uretprobe in a s390x TCG guest causes a SIGSEGV. What > > > > > happens is: > > > > > > > > > > * uretprobe maps a userspace page containing an invalid > > > > > instruction. > > > > > * uretprobe replaces the target function's return address with the > > > > > address of that page. > > > > > * When tb_gen_code() is called on that page, tb->size ends up being > > > > > 0 > > > > > (because the page starts with the invalid instruction), which > > > > > causes > > > > > virt_page2 to point to the previous page. > > > > > * The previous page is not mapped, so this causes a spurious > > > > > translation exception. > > > > > > > > > > The bug is that tb->size must never be 0: even if there is an > > > > > illegal > > > > > instruction, the instruction bytes that have been looked at must > > > > > count > > > > > towards tb->size. So adjust s390x's translate_one() to act this way > > > > > for both illegal instructions and instructions that are known to > > > > > generate exceptions. > > > > > > > > > > Also add an assertion to tb_gen_code() in order to detect such > > > > > situations in future. > > > > > > > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > > > > --- > > > > > > > > > > v1: > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-04/msg02037.html > > > > > v1 -> v2: Fix target/s390x instead of trying to tolerate tb->size > > > > > == 0 > > > > > in tb_gen_code(). > > > > > > > > > > accel/tcg/translate-all.c | 1 + > > > > > target/s390x/translate.c | 16 +++++++++++----- > > > > > 2 files changed, 12 insertions(+), 5 deletions(-) > > > > > > > > I assume this bug is not usually hit during normal usage, right? It's > > > > probably not release critical, so I'll line it up for 6.1 instead. > > > > > > Yes, I saw it only with uprobes, and then it leads only to a process > > > crash, not to a kernel crash. Thanks! > > > > Seems like the new assertion triggers on ARM: > > > > https://gitlab.com/cohuck/qemu/-/jobs/1178409450 > > Yep, I just wanted to make sure it was this patch before complaining :) > > > > > What are the rules in s390x-next-staging, can we amend the patch, or > > only commit a follow-up? > > -staging is before I merge properly, so no problem folding something in. > > > In either case, I think we'll need something > > like this (untested): > > > > --- a/target/arm/translate.c > > +++ b/target/arm/translate.c > > @@ -9060,6 +9060,7 @@ static void > > arm_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu) > > unsigned int insn; > > > > if (arm_pre_translate_insn(dc)) { > > + dc->base.pc_next += 4; > > return; > > } > > > > > > I'm currently trying to debug this in more detail and test the fix. > > > I'm also seeing a problem on xtensa (https://gitlab.com/cohuck/qemu/-/jobs/1178409540), but not sure if it is related to this patch, or more general flakiness.
© 2016 - 2024 Red Hat, Inc.