SDM revision 087 points out that apparently as of quite some time ago on
Intel hardware BSF and BSR may alter all arithmetic flags, not just ZF.
Because of the inconsistency (and because documentation doesn't look to
be quite right about PF), best we can do is simply take the flag values
from what the processor produces, just like we do for various other
arithmetic insns. (Note also that AMD and Intel have always been
disagreeing on arithmetic flags other than ZF.) To be both safe (against
further anomalies) and consistent, extend this to {L,T}ZCNT as well.
(Emulating the two insns correctly even when underlying hardware doesn't
support it was perhaps nice, but yielded guest-observable
inconsistencies.)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Use emulate_2op_SrcV_srcmem() also for {L,T}ZCNT.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5270,62 +5270,26 @@ x86_emulate(
break;
case X86EMUL_OPC(0x0f, 0xbc): /* bsf or tzcnt */
- {
- bool zf;
-
- asm ( "bsf %2,%0" ASM_FLAG_OUT(, "; setz %1")
- : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
- : "rm" (src.val) );
- _regs.eflags &= ~X86_EFLAGS_ZF;
- if ( (vex.pfx == vex_f3) && vcpu_has_bmi1() )
- {
- _regs.eflags &= ~X86_EFLAGS_CF;
- if ( zf )
- {
- _regs.eflags |= X86_EFLAGS_CF;
- dst.val = op_bytes * 8;
- }
- else if ( !dst.val )
- _regs.eflags |= X86_EFLAGS_ZF;
- }
- else if ( zf )
+ if ( vex.pfx == vex_f3 )
+ emulate_2op_SrcV_srcmem("rep; bsf", src, dst, _regs.eflags);
+ else
{
- _regs.eflags |= X86_EFLAGS_ZF;
- dst.type = OP_NONE;
+ emulate_2op_SrcV_srcmem("bsf", src, dst, _regs.eflags);
+ if ( _regs.eflags & X86_EFLAGS_ZF )
+ dst.type = OP_NONE;
}
break;
- }
case X86EMUL_OPC(0x0f, 0xbd): /* bsr or lzcnt */
- {
- bool zf;
-
- asm ( "bsr %2,%0" ASM_FLAG_OUT(, "; setz %1")
- : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
- : "rm" (src.val) );
- _regs.eflags &= ~X86_EFLAGS_ZF;
- if ( (vex.pfx == vex_f3) && vcpu_has_lzcnt() )
- {
- _regs.eflags &= ~X86_EFLAGS_CF;
- if ( zf )
- {
- _regs.eflags |= X86_EFLAGS_CF;
- dst.val = op_bytes * 8;
- }
- else
- {
- dst.val = op_bytes * 8 - 1 - dst.val;
- if ( !dst.val )
- _regs.eflags |= X86_EFLAGS_ZF;
- }
- }
- else if ( zf )
+ if ( vex.pfx == vex_f3 )
+ emulate_2op_SrcV_srcmem("rep; bsr", src, dst, _regs.eflags);
+ else
{
- _regs.eflags |= X86_EFLAGS_ZF;
- dst.type = OP_NONE;
+ emulate_2op_SrcV_srcmem("bsr", src, dst, _regs.eflags);
+ if ( _regs.eflags & X86_EFLAGS_ZF )
+ dst.type = OP_NONE;
}
break;
- }
case X86EMUL_OPC(0x0f, 0xbe): /* movsx rm8,r{16,32,64} */
/* Recompute DstReg as we may have decoded AH/BH/CH/DH. */
On 14/07/2025 5:02 pm, Jan Beulich wrote:
> SDM revision 087 points out that apparently as of quite some time ago on
> Intel hardware BSF and BSR may alter all arithmetic flags, not just ZF.
> Because of the inconsistency (and because documentation doesn't look to
It's probably worth saying errata explicitly. There are a whole bunch
of Intel CPUs where the behaviour doesn't match CPUID.
> be quite right about PF), best we can do is simply take the flag values
> from what the processor produces, just like we do for various other
> arithmetic insns. (Note also that AMD and Intel have always been
> disagreeing on arithmetic flags other than ZF.) To be both safe (against
> further anomalies) and consistent, extend this to {L,T}ZCNT as well.
> (Emulating the two insns correctly even when underlying hardware doesn't
> support it was perhaps nice, but yielded guest-observable
> inconsistencies.)
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
This is one of the more messy parts of x86, and that's saying something.
> ---
> v2: Use emulate_2op_SrcV_srcmem() also for {L,T}ZCNT.
>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -5270,62 +5270,26 @@ x86_emulate(
> break;
>
> case X86EMUL_OPC(0x0f, 0xbc): /* bsf or tzcnt */
> - {
> - bool zf;
> -
> - asm ( "bsf %2,%0" ASM_FLAG_OUT(, "; setz %1")
> - : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
> - : "rm" (src.val) );
> - _regs.eflags &= ~X86_EFLAGS_ZF;
> - if ( (vex.pfx == vex_f3) && vcpu_has_bmi1() )
> - {
> - _regs.eflags &= ~X86_EFLAGS_CF;
> - if ( zf )
> - {
> - _regs.eflags |= X86_EFLAGS_CF;
> - dst.val = op_bytes * 8;
> - }
> - else if ( !dst.val )
> - _regs.eflags |= X86_EFLAGS_ZF;
> - }
> - else if ( zf )
> + if ( vex.pfx == vex_f3 )
> + emulate_2op_SrcV_srcmem("rep; bsf", src, dst, _regs.eflags);
Do we need the ; ?
We surely don't on 4.21, but I presume there are bugs in older
binutils? (All Clangs back to 3.5 seem happy)
~Andrew
On 14.07.2025 18:19, Andrew Cooper wrote:
> On 14/07/2025 5:02 pm, Jan Beulich wrote:
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -5270,62 +5270,26 @@ x86_emulate(
>> break;
>>
>> case X86EMUL_OPC(0x0f, 0xbc): /* bsf or tzcnt */
>> - {
>> - bool zf;
>> -
>> - asm ( "bsf %2,%0" ASM_FLAG_OUT(, "; setz %1")
>> - : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
>> - : "rm" (src.val) );
>> - _regs.eflags &= ~X86_EFLAGS_ZF;
>> - if ( (vex.pfx == vex_f3) && vcpu_has_bmi1() )
>> - {
>> - _regs.eflags &= ~X86_EFLAGS_CF;
>> - if ( zf )
>> - {
>> - _regs.eflags |= X86_EFLAGS_CF;
>> - dst.val = op_bytes * 8;
>> - }
>> - else if ( !dst.val )
>> - _regs.eflags |= X86_EFLAGS_ZF;
>> - }
>> - else if ( zf )
>> + if ( vex.pfx == vex_f3 )
>> + emulate_2op_SrcV_srcmem("rep; bsf", src, dst, _regs.eflags);
>
> Do we need the ; ?
>
> We surely don't on 4.21, but I presume there are bugs in older
> binutils? (All Clangs back to 3.5 seem happy)
Actually we can use TZCNT here and LZCNT below with gas 2.25 (and Clang
looks to be happy too, even version 3.0). I expect that's preferable over
merely omitting the semicolons?
Jan
On 14.07.2025 18:19, Andrew Cooper wrote:
> On 14/07/2025 5:02 pm, Jan Beulich wrote:
>> SDM revision 087 points out that apparently as of quite some time ago on
>> Intel hardware BSF and BSR may alter all arithmetic flags, not just ZF.
>> Because of the inconsistency (and because documentation doesn't look to
>
> It's probably worth saying errata explicitly. There are a whole bunch
> of Intel CPUs where the behaviour doesn't match CPUID.
Okay, I've adjusted the wording slightly.
>> be quite right about PF), best we can do is simply take the flag values
>> from what the processor produces, just like we do for various other
>> arithmetic insns. (Note also that AMD and Intel have always been
>> disagreeing on arithmetic flags other than ZF.) To be both safe (against
>> further anomalies) and consistent, extend this to {L,T}ZCNT as well.
>> (Emulating the two insns correctly even when underlying hardware doesn't
>> support it was perhaps nice, but yielded guest-observable
>> inconsistencies.)
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Thanks.
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -5270,62 +5270,26 @@ x86_emulate(
>> break;
>>
>> case X86EMUL_OPC(0x0f, 0xbc): /* bsf or tzcnt */
>> - {
>> - bool zf;
>> -
>> - asm ( "bsf %2,%0" ASM_FLAG_OUT(, "; setz %1")
>> - : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
>> - : "rm" (src.val) );
>> - _regs.eflags &= ~X86_EFLAGS_ZF;
>> - if ( (vex.pfx == vex_f3) && vcpu_has_bmi1() )
>> - {
>> - _regs.eflags &= ~X86_EFLAGS_CF;
>> - if ( zf )
>> - {
>> - _regs.eflags |= X86_EFLAGS_CF;
>> - dst.val = op_bytes * 8;
>> - }
>> - else if ( !dst.val )
>> - _regs.eflags |= X86_EFLAGS_ZF;
>> - }
>> - else if ( zf )
>> + if ( vex.pfx == vex_f3 )
>> + emulate_2op_SrcV_srcmem("rep; bsf", src, dst, _regs.eflags);
>
> Do we need the ; ?
>
> We surely don't on 4.21, but I presume there are bugs in older
> binutils? (All Clangs back to 3.5 seem happy)
Right, we don't really need this on staging. I can omit it and then hopefully
not forget to add it back when backporting (which I was intending to do).
Jan
© 2016 - 2025 Red Hat, Inc.