From nobody Mon Feb 9 20:35:17 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1493107298022474.6415993769873; Tue, 25 Apr 2017 01:01:38 -0700 (PDT) Received: from localhost ([::1]:47605 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d2vPk-0005vd-2e for importer@patchew.org; Tue, 25 Apr 2017 04:01:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34069) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d2vIv-00007L-Ru for qemu-devel@nongnu.org; Tue, 25 Apr 2017 03:54:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d2vIs-0000rF-K8 for qemu-devel@nongnu.org; Tue, 25 Apr 2017 03:54:33 -0400 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:58905) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1d2vIj-0000hw-IG; Tue, 25 Apr 2017 03:54:21 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 162EC21F23; Tue, 25 Apr 2017 03:54:06 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute4.internal (MEProxy); Tue, 25 Apr 2017 03:54:06 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id BC4BD7E0EE; Tue, 25 Apr 2017 03:54:05 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=s+8 HEVymHu6nfiCAkM0PAl2FnRIyZUoKX9jfN3mgt+Q=; b=zS1LDyX8toH3qVlHjmR IHnfYRAhb2t52Rwb2p2owrXPSCBlEEg4jvWlz0kVr/uH7pb109P6vRhyshJuL8iO NWJXssimwcxkIMJ0mjKFxtHw8VWuQW78gp895bUVMuXPW5xPFw7i86GWFWXBjFGC U1OHWqiRZWQq23mLmWmOJUhw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=s+8HEVymHu6nfiCAkM0PAl2FnRIyZUoKX9jfN3mgt +Q=; b=X3bjEFQggqU/IuXquvm6KdZ7RF7IqYhY4sIhe5GinZ4sbi9EnDFYrMn64 P/gDXZjYEYrXz5pRHGBse6umyRUPc8S4c39sbLYsoyJseDW5SQxK53pFJDn1of4U oEgfKlvP4QomRlnchqEfFIA0m4F8js5YLRC0N/e1lWGOzxdBfPg57BZCZS1jZ1KT mYKB8JG+LA/A1OXxHykltEcjpRbAhH7m/sQKQN3+NtiXbfUp7xLQOmROIHX5w6m1 eUGiz0dvELvGwwLoq2vW9wGN3o42pDS411I8iIfbL72247uNvvLOkaCcA9396xxb 4HZaoGSaMpuHfcdcsjS/dZvPH0FEA== X-ME-Sender: X-Sasl-enc: AXHJjJjw3ssMIwSD2NFrtHWprVzUq/D0sPaU5F1VjIJR 1493106845 From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Tue, 25 Apr 2017 03:53:59 -0400 Message-Id: <1493106839-10438-14-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1493106839-10438-1-git-send-email-cota@braap.org> References: <1493106839-10438-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.28 Subject: [Qemu-devel] [PATCH v2 13/13] tb-hash: improve tb_jmp_cache hash function in user mode X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Eduardo Habkost , Peter Crosthwaite , Stefan Weil , Alexander Graf , alex.bennee@linaro.org, qemu-arm@nongnu.org, Pranith Kumar , Paolo Bonzini , Aurelien Jarno , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimizations to cross-page chaining and indirect branches make performance more sensitive to the hit rate of tb_jmp_cache. The constraint of reserving some bits for the page number lowers the achievable quality of the hashing function. However, user-mode does not have this requirement. Thus, with this change we use for user-mode a hashing function that is both faster and of better quality than the previous one. Measurements: Note: baseline (i.e. speedup =3D=3D 1x) is QEMU v2.9.0. - SPECint06 (test set), x86_64-linux-user. Host:= Intel i7-6700K @ 4.00GHz 2x +-+------------------------------------------------------------------= --------------------------------------------+-+ | = +++++ | | jr+noinline = | | | | jr+inline = ++%%@ | 1.8x +-+jr+hash+noinline +.........................................= .....|%%@...................................+-+ |jr+multhash+inline = |%%@+ | | jr+hash+inline = +$$$%@ | | = ++##|$%@ +++ | 1.6x +-+..................................................................= ..|##|$%@....................+%%%...........+-+ | @@+ = **#+$%@ $$+% | | $$$%@+ = +**#+$%@ ++++ ++$$+%@ | | ++++ $ $%@ = **# $%@ +$$%@@+++$$ %@ | 1.4x +-+.....................+%%%@..........##+$%@........................= ..**#.$%@...........+$$%.@***$$.%@..........+-+ | ++$$+%@ ##+$%@ = **# $%@ $$% @* *#$+%@ | | ***#$ %@ +**# $%@ = **# $%@ +###$% @* *#$ %@ | | *+*#$ %@ +%%@+**# $%@ = **# $%@ **+#$% @*+*#$ %@ +%%%@+ | 1.2x +-+..................*.*#$.%@***#$$%@+**#.$%@........................= ..**#.$%@.........**.#$%.@*.*#$.%@***#$+%@+.+-+ | +++ * *#$ %@* *# $%@ **# $%@ +++++++ = **# $%@ +++%%@@** #$% @* *#$ %@*+*#$ %@ | | ++###$%+ * *#$ %@* *# $%@ **# $%@ **##$%@@ = **# $%@+**#$$%+@** #$% @* *#$ %@* *#$ %@ | | +**+#$%@@ ++$$@@@* *#$ %@* *# $%@ **# $%@ ** #$% @+###++@@++++%%%= + **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ | 1x +-++-**+#$%-@**##$%+@*+*#$+%@*+*#+$%@+**#+$%@+**+#$%+@**+#$+@@***#$+%= @+**#+$%@+**#+$%+@**+#$%+@*+*#$+%@*-*#$+%@-++-+ | ** #$% @** #$% @* *#$ %@* *# $%@ **# $%@ ** #$% @** #$%%@* *#$ %= @ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ | | ** #$% @** #$% @* *#$ %@* *# $%@ **# $%@ ** #$% @** #$+%@* *#$ %= @ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ | | ** #$% @** #$% @* *#$ %@* *# $%@ **# $%@ ** #$% @** #$ %@* *#$ %= @ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ | 0.8x +-+--**##$%@@**##$%@@***#$%%@***#$$%@-**#$$%@-**##$%@@**##$%%@***#$%%= @-**#$$%@-**#$$%@@**##$%@@***#$%%@***#$%%@--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf om= netpperlbench sjengxalancbmk hmean png: http://imgur.com/1ZJGjzV Here I also tried the hash function suggested by Paolo ("multhash"): return ((uint64_t) (pc * 2654435761) >> 32) & (); As you can see it is just as good as the other new function ("hash"), but I kept "hash" because with it all benchmarks have speedup > 1. - SPECint06 (train set), x86_64-linux-user. Host: = Intel i7-6700K @ 4.00GHz 2.6x +-+------------------------------------------------------------------= --------------------------------------------+-+ | = | | jr+inline = | 2.4x +jr+inline+hash......................................................= ..............................###...........+-+ | = # # | | = # # | 2.2x +-+..................................................................= ..............................#.#...........+-+ | = # # | | = # # | 2x +-+..................................................................= ..............................#.#...........+-+ | = # # | | = **** # | 1.8x +-+..................................................................= ...........................*..*.#...........+-+ | = +++ #### * * # | | = #### ****++# * * # | 1.6x +-+......................................+++.........................= ..****..#.*++*..#..........*..*.#...........+-+ | #### = *++* # * * # +++ * * # | | +++ ++#++# = * * # * * # #### * * # | 1.4x +-+...................+++###..........****..#........................= ..*..*..#.*..*..#....#..#..*..*.#...........+-+ | ****+# * * # = * * # * * # *** # * * # #### | | *++* # +++ * * # = * * # * * # *+* # * * # ****++# | 1.2x +-+...................*..*.#..****###.*..*..#........................= ..*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+ | ****### +++ * * # * * # * * # = * * # * * # * * # * * # * * # | | * *++# ***### * * # * * # * * # = * * # * * # * * # * * # * * # | 1x +-+--****###--***###--****##--****###-****###--***###--***###--****##= --****###-****###--***###--****##--****###--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf om= netpperlbench sjengxalancbmk hmean png: http://imgur.com/1D2VFze - NBench, x86_64-linux-user. Host: Intel = i7-6700K @ 4.00GHz 1.1x +-+-----------------------------------------------------------------= --------------------------------------------+-+ | = | | jr+inline = | 1.08x +jr+hash+noinline +..............................+++.........= ............................................+-+ | jr+hash+inline | = | | +++| = | | | |+++ = | 1.06x +-+....................................................|.|.|........= ............................................+-+ | |###| = +++++ | | |#|#| = ###$$$ | 1.04x +-+.........................+++....+++.+++.............|#|#$$$......= ........................++#|#++$............+-+ | |+++ |+++| ****|#| $ = +++ |#+# $ | | | | | | | * |*+#| $ = |+++ **** # $ | | +++ +++ | | ****| | * |* #++$ = | |+++ * |* # $ | 1.02x +-+....|..................|####$$.*.|*|$$$$.++++++++.*.|*.#..$......= ....****|.|............*++*.#..$.++++++++...+-+ | ***+++ |# |#|$ * |*##| $ | | | * |* # $ = * |*| | +++ * * # $ ***###$$ | | *|* |+++ +++ +++ *** |#|$ * |*|#| $ ***###$$ *++* # $ = +++ * |*##$$$ ####++ * * # $ *+*++# $ | 1x +-++-+*+*###+++****-$$$$+*+*++#+$+*++*+#++$+*+*++#+$+*++*-#++$+++-++= $$$+*++*+#++$+***++#$$+*++*-#++$+*+*++#+$+-++-+ | * *++#$$ *++*|$++$ *|*++# $ * *+#++$ *+*++#|$ * * # $ *** |= $+$ * *|#| $ *+* #+$ * * # $ * * # $ | | * * #+$ * *## $ *+* # $ * * # $ * * #+$ * * # $ *+*##= # $ * *|#++$ * * # $ * * # $ * * # $ | | * * # $ * *|# $ * * # $ * * # $ * * # $ * * # $ * *++= # $ * *+# $ * * # $ * * # $ * * # $ | 0.98x +-+...*.*..#.$.*..*+#..$.*.*..#.$.*..*.#..$.*.*..#.$.*..*.#..$.*.*..= #.$.*..*.#..$.*.*..#.$.*..*.#..$.*.*..#.$...+-+ | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * = # $ * * # $ * * # $ * * # $ * * # $ | | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * = # $ * * # $ * * # $ * * # $ * * # $ | 0.96x +-+---***###$$-****##$$$-***###$$-****##$$$-***###$$-****##$$$-***##= #$$-****##$$$-***###$$-****##$$$-***###$$---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONE= URAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/xK9YfOB - NBench, arm-linux-user. Host: Intel i7= -4790K @ 4.00GHz 1.3x +-+-----------------------------------------------------------------= --------------------------------------------+-+ | #### = +++ | | jr+inline #++# = #### | 1.25x +jr+hash+inline..............#..#...................................= ........#++#................................+-+ | # # = # # | | # # = # # | | # # = # # | 1.2x +-+..........................#..#..................................#= ###.....#..#................................+-+ | # # +++#= ++# # # | | # # *****= # # # | 1.15x +-+..........................#..#..............................*+++*= ..#.....#..#................................+-+ | # # * *= # **** # | | # # * *= # *++* # | | # # * *= # * * # | 1.1x +-+..........................#..#...............+++............*...*= ..#..*..*..#................................+-+ | # # +++#### * *= # * * # #### | | # # ***** # * *= # * * # # # | 1.05x +-+..........................#..#...........*...*..#...........*...*= ..#..*..*..#...............####......#..#...+-+ | # # +++ * * # * *= # * * # #++# ***** # | | +++# # ****### * * # ****### * *= # * * # +++# # * * # | | ++++++ ****### ***** # *++*++# * * # *++*++# * *= # * * # ++++++ **** # * * # | 1x +-++-+*****###++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*= ++#+-*++*++#-+*****###++*++*++#++*+-+*++#+-++-+ | *+++*++# * * # * * # * * # * * # * * # * *= # * * # * * # * * # * * # | | * * # * * # * * # * * # * * # * * # * *= # * * # * * # * * # * * # | 0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****= ###--****###--*****###--****###--*****###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONE= URAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/uhIEOA1 Signed-off-by: Emilio G. Cota Reviewed-by: Richard Henderson --- include/exec/tb-hash.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 2c27490..b1fe2d0 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -22,6 +22,8 @@ =20 #include "exec/tb-hash-xx.h" =20 +#ifdef CONFIG_SOFTMMU + /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for addresses on the same page. The top bits are the same. This allows TLB invalidation to quickly clear a subset of the hash table. */ @@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target= _ulong pc) | (tmp & TB_JMP_ADDR_MASK)); } =20 +#else + +/* In user-mode we can get better hashing because we do not have a TLB */ +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) +{ + return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1); +} + +#endif /* CONFIG_SOFTMMU */ + static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t fl= ags) { --=20 2.7.4