From nobody Mon Feb 9 15:10:54 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; dkim=fail spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 149329524024278.22897404456592; Thu, 27 Apr 2017 05:14:00 -0700 (PDT) Received: from localhost ([::1]:60359 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d3iJ4-0007BR-Qd for importer@patchew.org; Thu, 27 Apr 2017 08:13:58 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52346) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d3i69-00049l-Bj for qemu-devel@nongnu.org; Thu, 27 Apr 2017 08:00:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d3i63-0001Xh-7T for qemu-devel@nongnu.org; Thu, 27 Apr 2017 08:00:37 -0400 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:34237) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1d3i62-0001Wz-TK for qemu-devel@nongnu.org; Thu, 27 Apr 2017 08:00:31 -0400 Received: by mail-wm0-x242.google.com with SMTP id z129so4092768wmb.1 for ; Thu, 27 Apr 2017 05:00:30 -0700 (PDT) Received: from bigtime.twiddle.net.com (251.red-80-29-7.adsl.static.ccgg.telefonica.net. [80.29.7.251]) by smtp.gmail.com with ESMTPSA id i199sm3056470wmf.33.2017.04.27.05.00.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Apr 2017 05:00:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=nY7gKY38hus5cAbAP+itEEwuguvQg+GIIDLAJ1WLuzs=; b=FBPNoJoH44hVfeiSyfTGk+EZvTqIInHMy8M1R4OKQMVlc+6V8JnlnymYJ2C3y52xsV Nk/jZ80a1wz7sD2hsf+TP0mBVU5JwRg7ULfjjhUx8qGv0gvdeHTUdUT/ZdWGDFoDQACB NOST/KwZrvzMZOZDG+0hKNLlYh87vuQVVuZKqGcfFw5aUXu3nEKopD9gc+DusryIzUdi DZbCmd/OiYe0JMiAJJ3wT9F1MIdwNZmPC5JscuS+h33OvsCV87AVDYN92Fg66r1b+rcn lURIGrvKUPW0MSsQlkYK2DdFxa2XXvzFJs+7BjCvtVl2G+Z7D8KUcYW+mPgem8SnSPIw XqXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=nY7gKY38hus5cAbAP+itEEwuguvQg+GIIDLAJ1WLuzs=; b=rRhafi07HspJXwX47CspTgCGKmkFNSAYx9XP9gqrwKfHx4rJ+5mWWrpjJcaMRE43QS 1Gn3voJTIkNETimN6sjJOrQokiA2qOGdZl7hYjBfy9PlwD+Hf55o2bfoaRU9kcO4Enwc j3SsWepS/YUjFr4mnnkD9hHaKh5ytjdlb6ZN/9VZwhcwyRa2Kn8qO5RLOb5GowxHlg4T Nub1rN5mkQZMtg5/BRa8qxaWOvQ2WfH7QNgkzn0rBdFk9TGeVNINsvax6KVjf5ipMoCG ydgtLBD7taJMBjfqnowIzb9dd61Zgg6lA1fin7uw5agmDexoqDZdKAYCSO7mQa47CVkC KGtQ== X-Gm-Message-State: AN3rC/7bgLjBegq2v6YuYA+eSaZL33n/NYlL+UiTHClY0J65UTSy1GOc xKl4iz5Yn5YUE9PFZcY= X-Received: by 10.28.17.21 with SMTP id 21mr2114151wmr.83.1493294429061; Thu, 27 Apr 2017 05:00:29 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 27 Apr 2017 14:00:00 +0200 Message-Id: <20170427120006.20564-14-rth@twiddle.net> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170427120006.20564-1-rth@twiddle.net> References: <20170427120006.20564-1-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::242 Subject: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cota@braap.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Emilio G. Cota" Optimizations to cross-page chaining and indirect branches make performance more sensitive to the hit rate of tb_jmp_cache. The constraint of reserving some bits for the page number lowers the achievable quality of the hashing function. However, user-mode does not have this requirement. Thus, with this change we use for user-mode a hashing function that is both faster and of better quality than the previous one. Measurements: Note: baseline (i.e. speedup =3D=3D 1x) is QEMU v2.9.0. - SPECint06 (test set), x86_64-linux-user. Host: = Intel i7-6700K @ 4.00GHz 2.2x +-+------------------------------------------------------------------= --------------------------------------------+-+ | = | | jr = | 2x +jr+multhash +................................................= ....+++++...................................+-+ | jr+hash = |$$$ | | = |$+$ | | = ### $ | 1.8x +-+..................................................................= ....#|#.$...................................+-+ | = ++#+# $ | | = |# # $ | 1.6x +-+..................................................................= ..***.#.$....................++$$$..........+-+ | $$$ = *+* # $ |$+$ | | ++$$$ ### $ = * * # $ +++|$ $ | | ++###+$ # # $ = * * # $ ### ****## $ | 1.4x +-+...................***+#.$.........***.#.$........................= ..*.*.#.$...........#+#$$.*++*|#.$..........+-+ | *+* # $ * * # $ = * * # $ # # $ * *+# $ | | * * # $ +++++ * * # $ = * * # $ *** # $ * * # $ ###$$ | 1.2x +-+...................*.*.#.$.***##$$.*.*.#.$........................= ..*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+ | * * # $ *+* # $ * * # $ +++ = * * # $ ++###$$ * * # $ * * # $ * * # $ | | ***##$$ * * # $ * * # $ * * # $ ***##$$ ++### = * * # $ *** #+$ * * # $ * * # $ * * # $ | | *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+# = * * # $ * * # $ * * # $ * * # $ * * # $ | 1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$= $+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+ | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # = $ * * # $ * * # $ * * # $ * * # $ * * # $ | | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # = $ * * # $ * * # $ * * # $ * * # $ * * # $ | 0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$= $-***##$$-***##$$-***##$$-****##$$-***##$$--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf om= netpperlbench sjengxalancbmk hmean png: http://imgur.com/4UXTrEc Here I also tried the hash function suggested by Paolo ("multhash"): return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1); As you can see it is just as good as the other new function ("hash"), which is what I ended up going with. - SPECint06 (train set), x86_64-linux-user. Host: = Intel i7-6700K @ 4.00GHz 2.6x +-+------------------------------------------------------------------= --------------------------------------------+-+ | = | | jr = ### | 2.4x +jr+hash.............................................................= ..............................#.#...........+-+ | = # # | | = # # | 2.2x +-+..................................................................= ..............................#.#...........+-+ | = # # | | = # # | 2x +-+..................................................................= ..............................#.#...........+-+ | = **** # | | = * * # | 1.8x +-+..................................................................= ...........................*..*.#...........+-+ | = +++ * * # | | = #### #### * * # | 1.6x +-+......................................####........................= .....#..#.****..#..........*..*.#...........+-+ | +++ #++# = **** # * * # #### * * # | | ### # # = * * # * * # # # * * # | 1.4x +-+...................****+#..........****..#........................= ..*..*..#.*..*..#....#..#..*..*.#...........+-+ | *++* # * * # = * * # * * # *** # * * # #### | | * * # #### * * # = * * # * * # * * # * * # **** # | 1.2x +-+...................*..*.#..****++#.*..*..#........................= ..*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+ | ****### * * # * * # * * # = * * # * * # * * # * * # * * # | | * * # ***### * * # * * # * * # ****##= * * # * * # * * # * * # * * # | 1x +-+--****###--***###--****##--****###-****###--***###--***###--****##= --****###-****###--***###--****##--****###--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf om= netpperlbench sjengxalancbmk hmean png: http://imgur.com/ArCbHqo - NBench, x86_64-linux-user. Host: Intel= i7-6700K @ 4.00GHz 1.12x +-+-----------------------------------------------------------------= --------------------------------------------+-+ | = | | jr += ++ | 1.1x +jr+hash...........................................................#= ###.........................................+-+ | +++#= | # | | | #= ++# | 1.08x +-+................................+++................+++.+++..*****= ..#.........................................+-+ | | +++ | | * | *= # | | | | | | *+++*= # | 1.06x +-+................................****###.............|...|...*...*= ..#.........................+++.............+-+ | *| * |# ****### * *= # | | | *| *++# *| * |# * *= # #### | 1.04x +-+................................*++*..#............*|.*.|#..*...*= ..#........................#.|#.............+-+ | * * # *++*++# * *= # +++#++# | | * * # * * # * *= # | # # +++#### | 1.02x +-+................................*..*..#......+++...*..*..#..*...*= ..#.....................****..#..*****++#...+-+ | +++ * * # +++ | * * # * *= # +++ *| * # *+++* # | | +++ | +++ +++ ++++++ * * # *****### * * # * *= # | +++ ++++++ *++* # * * # | 1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*= ++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+ | *****| # *++* |# *****| # * * # * *++# * * # * *= # **** |# * * # * * # * * # | | * | *| # * *++# * | *++# * * # * * # * * # * *= # *| *++# * * # * * # * * # | 0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*= ..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+ | *+++* # * * # * * # * * # * * # * * # * *= # * * # * * # * * # * * # | | * * # * * # * * # * * # * * # * * # * *= # * * # * * # * * # * * # | 0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****= ###--****###--*****###--****###--*****###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONE= URAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/ZXFX0hJ - NBench, arm-linux-user. Host: Intel i7-= 4790K @ 4.00GHz 1.3x +-+-----------------------------------------------------------------= --------------------------------------------+-+ | #### = | | jr # # = +++ | 1.25x +jr+hash.....................#..#...................................= ........####................................+-+ | # # = # # | | # # = # # | 1.2x +-+..........................#..#...................................= ........#..#................................+-+ | # # = # # | | # # = # # | 1.15x +-+..........................#..#...................................= ........#..#................................+-+ | # # #= ### # # | | # # #= # # # | 1.1x +-+..........................#..#..................................#= ..#.....#..#................................+-+ | # # #= # # # +++ | | # # #### #= # # # #### | 1.05x +-+..........................#..#...............#..#.....####......#= ..#.....#..#.........................#..#...+-+ | # # # # # # #= # # # +++ # # | | +++ ***** # #### ***** # # # +++#= # **** # ****### # # | 1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****= ++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+ | * * # * * | * * # * * # * * # **** # * *= # * * # * *### * *++# * * # | | * * # * *### * * # * * # * * # * * # * *= # * * # * * # * * # * * # | 0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*= ..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+ | * * # * * |# * * # * * # * * # * * # * *= # * * # * * # * * # * * # | | * * # * * |# * * # * * # * * # * * # * *= # * * # * * # * * # * * # | 0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****= ###--****###--*****###--****###--*****###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONE= URAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/FfD27ey Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota Message-Id: <1493263764-18657-12-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e --- include/exec/tb-hash.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 2c27490..b1fe2d0 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -22,6 +22,8 @@ =20 #include "exec/tb-hash-xx.h" =20 +#ifdef CONFIG_SOFTMMU + /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for addresses on the same page. The top bits are the same. This allows TLB invalidation to quickly clear a subset of the hash table. */ @@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target= _ulong pc) | (tmp & TB_JMP_ADDR_MASK)); } =20 +#else + +/* In user-mode we can get better hashing because we do not have a TLB */ +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) +{ + return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1); +} + +#endif /* CONFIG_SOFTMMU */ + static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t fl= ags) { --=20 2.9.3