On 22/10/2021 11:47, Lin Liu wrote:
> The swab() is massively over complicated
> Simplify it with compiler builtins and fallback to plain C function
> if undefined.
> Update components to switch to this new swap bytes.
>
> <snip>
> 34 files changed, 150 insertions(+), 646 deletions(-)
It is worth saying a couple of things.
x86's ___arch__swab64 is wrong. Well - it was mostly ok for 32bit
builds of Xen, and is not ok for 64bit builds. As a consequence, this
series nets an improvement of:
$ ../scripts/bloat-o-meter xen-syms-before xen-syms-after
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-54 (-54)
Function old new delta
elf_access_unsigned 173 151 -22
unlzo 1128 1096 -32
Total: Before=3803328, After=3803274, chg -0.00%
because the code generation for bswap64 goes from:
ffff82d0402059b0 <_bswap64>:
ffff82d0402059b0: 48 89 f8 mov %rdi,%rax
ffff82d0402059b3: 48 c1 e8 20 shr $0x20,%rax
ffff82d0402059b7: 0f cf bswap %edi
ffff82d0402059b9: 0f c8 bswap %eax
ffff82d0402059bb: 97 xchg %eax,%edi
ffff82d0402059bc: 48 89 c2 mov %rax,%rdx
ffff82d0402059bf: 89 f8 mov %edi,%eax
ffff82d0402059c1: 48 c1 e2 20 shl $0x20,%rdx
ffff82d0402059c5: 48 09 d0 or %rdx,%rax
ffff82d0402059c8: c3 retq
to
ffff82d0402059b0 <_bswap64>:
ffff82d0402059b0: 48 89 f8 mov %rdi,%rax
ffff82d0402059b3: 48 0f c8 bswap %rax
ffff82d0402059b6: c3 retq
Almost all byteswapping is done on 32bit quantities, not 64, which is
why the delta is so small.
However, it also drops 500 lines of code, which is a demonstration of
how silly the swab() infrastructure was. It also removes the need for
per-arch code to do any of this.
I'd say its safe to go into 4.16, but I'll understand if others want to
push back on that at this point in the release.
~Andrew