From nobody Wed Nov 5 17:38:10 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; dkim=fail spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1496851060190108.69369572066375; Wed, 7 Jun 2017 08:57:40 -0700 (PDT) Received: from localhost ([::1]:44787 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dIdL0-0001mk-FG for importer@patchew.org; Wed, 07 Jun 2017 11:57:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48453) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dIdJM-0000aA-U7 for qemu-devel@nongnu.org; Wed, 07 Jun 2017 11:55:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dIdJL-0005ZM-7N for qemu-devel@nongnu.org; Wed, 07 Jun 2017 11:55:56 -0400 Received: from mail-qt0-x244.google.com ([2607:f8b0:400d:c0d::244]:33153) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dIdJL-0005Z9-0S for qemu-devel@nongnu.org; Wed, 07 Jun 2017 11:55:55 -0400 Received: by mail-qt0-x244.google.com with SMTP id w1so1992628qtg.0 for ; Wed, 07 Jun 2017 08:55:54 -0700 (PDT) Received: from bigtime.twiddle.net.com (184-217-116-168.pools.spcsdns.net. [184.217.116.168]) by smtp.gmail.com with ESMTPSA id i28sm1272525qta.52.2017.06.07.08.55.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Jun 2017 08:55:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=AsCQswCx954j+sAL1Aci2LW/E9SdgngUR02mJs6IvBM=; b=MhxASwUZMqH4MhX/H5XzIE5waEQUp966mGXCnW8knF5oZaXLmtxgbAA8p5T4ZSzuJ0 EYI1OfmXdVp5218EnYttH/8gpDlsLypMx7VWgYZRkkjVNnFN3Fmg1CCAVZlE+iDK0r9R gBu9DzAkmunRDgZYgERc2axhJxvI+ylQ085vEzKuVqk7NqmtIdwWN4ZlXRGa32QDXujx Rey+TT6VM0R0jWX63Ds2aakJ0t6UcRfhRLoqKhw5SyQMHCHHJRi5oVTeUlWy8Uq4eZPQ 4EjaG1g9Pg+RqzLOYZaPsX8fDbd9ZwXC5f4IfY20iUhzCX0HNZgzD3CzlQ1MPH0dsjN9 2I3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=AsCQswCx954j+sAL1Aci2LW/E9SdgngUR02mJs6IvBM=; b=dbP5AgSURSa0UXrVTWxYWbnZ4EFahNAUEyr9ymBEM3e3YxshTjuvA9sUx2+TXOCXpc Pn4Hv5v+xlKyUDsmRRnv3wowPR2QiTRyLO/O4lhBjBRVXwcPVL0juOldF5xNK2we+VtI xkTq9JQk7i9E25pkh0RsnD2AvS/Xw5z2kqOhOZdwxkuqaWXwQ5GlqzxiapuuhwoO1/yT h+f5dv6Cq19y25PeVYbJcoYHeCUGLFsq2UGSO5ZSgF2/UTyDcA4EYmGnZ5Exx6EXPVOy qLClASwibNAveHFJiDX8CKzcSwni3mNdra24j4vNWwxTj4SNtgsl8goCye052YZRGZab zEMg== X-Gm-Message-State: AODbwcAHOt/1tpNNGHiu9yGlMI89Ec8JoT004fKPfMMPKxDo5NPR2VNi /DzS56WQ+c0WPtJGGf8= X-Received: by 10.200.40.251 with SMTP id j56mr12366473qtj.186.1496850954150; Wed, 07 Jun 2017 08:55:54 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Wed, 7 Jun 2017 08:55:31 -0700 Message-Id: <20170607155536.1193-3-rth@twiddle.net> X-Mailer: git-send-email 2.9.4 In-Reply-To: <20170607155536.1193-1-rth@twiddle.net> References: <20170607155536.1193-1-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c0d::244 Subject: [Qemu-devel] [PATCH v4 2/7] tcg: allocate TB structs before the corresponding translated code X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cota@braap.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Emilio G. Cota" Allocating an arbitrarily-sized array of tbs results in either (a) a lot of memory wasted or (b) unnecessary flushes of the code cache when we run out of TB structs in the array. An obvious solution would be to just malloc a TB struct when needed, and keep the TB array as an array of pointers (recall that tb_find_pc() needs the TB array to run in O(log n)). Perhaps a better solution, which is implemented in this patch, is to allocate TB's right before the translated code they describe. This results in some memory waste due to padding to have code and TBs in separate cache lines--for instance, I measured 4.7% of padding in the used portion of code_gen_buffer when booting aarch64 Linux on a host with 64-byte cache lines. However, it can allow for optimizations in some host architectures, since TCG backends could safely assume that the TB and the corresponding translated code are very close to each other in memory. See this message by rth for a detailed explanation: https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html Subject: Re: GSoC 2017 Proposal: TCG performance enhancements Message-ID: <1e67644b-4b30-887e-d329-1848e94c9484@twiddle.net> Suggested-by: Richard Henderson Reviewed-by: Pranith Kumar Signed-off-by: Emilio G. Cota Message-Id: <1496790745-314-3-git-send-email-cota@braap.org> [rth: Simplify the arithmetic in tcg_tb_alloc] Signed-off-by: Richard Henderson --- include/exec/tb-context.h | 3 ++- tcg/tcg.c | 20 ++++++++++++++++++++ tcg/tcg.h | 2 +- translate-all.c | 39 ++++++++++++++++++++++++--------------- 4 files changed, 47 insertions(+), 17 deletions(-) diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h index c7f17f2..25c2afe 100644 --- a/include/exec/tb-context.h +++ b/include/exec/tb-context.h @@ -31,8 +31,9 @@ typedef struct TBContext TBContext; =20 struct TBContext { =20 - TranslationBlock *tbs; + TranslationBlock **tbs; struct qht htable; + size_t tbs_size; int nb_tbs; /* any access to the tbs or the page table must use this lock */ QemuMutex tb_lock; diff --git a/tcg/tcg.c b/tcg/tcg.c index 564292f..3559829 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -383,6 +383,26 @@ void tcg_context_init(TCGContext *s) } } =20 +/* + * Allocate TBs right before their corresponding translated code, making + * sure that TBs and code are on different cache lines. + */ +TranslationBlock *tcg_tb_alloc(TCGContext *s) +{ + uintptr_t align =3D qemu_icache_linesize; + TranslationBlock *tb; + void *next; + + tb =3D (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, align); + next =3D (void *)ROUND_UP((uintptr_t)(tb + 1), align); + + if (unlikely(next > s->code_gen_highwater)) { + return NULL; + } + s->code_gen_ptr =3D next; + return tb; +} + void tcg_prologue_init(TCGContext *s) { size_t prologue_size, total_size; diff --git a/tcg/tcg.h b/tcg/tcg.h index 5ec48d1..9e37722 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -697,7 +697,6 @@ struct TCGContext { here, because there's too much arithmetic throughout that relies on addition and subtraction working on bytes. Rely on the GCC extension that allows arithmetic on void*. */ - int code_gen_max_blocks; void *code_gen_prologue; void *code_gen_epilogue; void *code_gen_buffer; @@ -756,6 +755,7 @@ static inline bool tcg_op_buf_full(void) /* tb_lock must be held for tcg_malloc_internal. */ void *tcg_malloc_internal(TCGContext *s, int size); void tcg_pool_reset(TCGContext *s); +TranslationBlock *tcg_tb_alloc(TCGContext *s); =20 void tb_lock(void); void tb_unlock(void); diff --git a/translate-all.c b/translate-all.c index b3ee876..bb094ad 100644 --- a/translate-all.c +++ b/translate-all.c @@ -781,12 +781,13 @@ static inline void code_gen_alloc(size_t tb_size) exit(1); } =20 - /* Estimate a good size for the number of TBs we can support. We - still haven't deducted the prologue from the buffer size here, - but that's minimal and won't affect the estimate much. */ - tcg_ctx.code_gen_max_blocks - =3D tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE; - tcg_ctx.tb_ctx.tbs =3D g_new(TranslationBlock, tcg_ctx.code_gen_max_bl= ocks); + /* size this conservatively -- realloc later if needed */ + tcg_ctx.tb_ctx.tbs_size =3D + tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE / 8; + if (unlikely(!tcg_ctx.tb_ctx.tbs_size)) { + tcg_ctx.tb_ctx.tbs_size =3D 64 * 1024; + } + tcg_ctx.tb_ctx.tbs =3D g_new(TranslationBlock *, tcg_ctx.tb_ctx.tbs_si= ze); =20 qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock); } @@ -828,13 +829,20 @@ bool tcg_enabled(void) static TranslationBlock *tb_alloc(target_ulong pc) { TranslationBlock *tb; + TBContext *ctx; =20 assert_tb_locked(); =20 - if (tcg_ctx.tb_ctx.nb_tbs >=3D tcg_ctx.code_gen_max_blocks) { + tb =3D tcg_tb_alloc(&tcg_ctx); + if (unlikely(tb =3D=3D NULL)) { return NULL; } - tb =3D &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++]; + ctx =3D &tcg_ctx.tb_ctx; + if (unlikely(ctx->nb_tbs =3D=3D ctx->tbs_size)) { + ctx->tbs_size *=3D 2; + ctx->tbs =3D g_renew(TranslationBlock *, ctx->tbs, ctx->tbs_size); + } + ctx->tbs[ctx->nb_tbs++] =3D tb; tb->pc =3D pc; tb->cflags =3D 0; tb->invalid =3D false; @@ -850,8 +858,10 @@ void tb_free(TranslationBlock *tb) Ignore the hard cases and just back up if this TB happens to be the last one generated. */ if (tcg_ctx.tb_ctx.nb_tbs > 0 && - tb =3D=3D &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) { - tcg_ctx.code_gen_ptr =3D tb->tc_ptr; + tb =3D=3D tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) { + size_t struct_size =3D ROUND_UP(sizeof(*tb), qemu_icache_linesize); + + tcg_ctx.code_gen_ptr =3D tb->tc_ptr - struct_size; tcg_ctx.tb_ctx.nb_tbs--; } } @@ -1666,7 +1676,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr) m_max =3D tcg_ctx.tb_ctx.nb_tbs - 1; while (m_min <=3D m_max) { m =3D (m_min + m_max) >> 1; - tb =3D &tcg_ctx.tb_ctx.tbs[m]; + tb =3D tcg_ctx.tb_ctx.tbs[m]; v =3D (uintptr_t)tb->tc_ptr; if (v =3D=3D tc_ptr) { return tb; @@ -1676,7 +1686,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr) m_min =3D m + 1; } } - return &tcg_ctx.tb_ctx.tbs[m_max]; + return tcg_ctx.tb_ctx.tbs[m_max]; } =20 #if !defined(CONFIG_USER_ONLY) @@ -1874,7 +1884,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fpr= intf) direct_jmp_count =3D 0; direct_jmp2_count =3D 0; for (i =3D 0; i < tcg_ctx.tb_ctx.nb_tbs; i++) { - tb =3D &tcg_ctx.tb_ctx.tbs[i]; + tb =3D tcg_ctx.tb_ctx.tbs[i]; target_code_size +=3D tb->size; if (tb->size > max_target_code_size) { max_target_code_size =3D tb->size; @@ -1894,8 +1904,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fpr= intf) cpu_fprintf(f, "gen code size %td/%zd\n", tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer); - cpu_fprintf(f, "TB count %d/%d\n", - tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.code_gen_max_blocks); + cpu_fprintf(f, "TB count %d\n", tcg_ctx.tb_ctx.nb_tbs); cpu_fprintf(f, "TB avg target size %d max=3D%d bytes\n", tcg_ctx.tb_ctx.nb_tbs ? target_code_size / tcg_ctx.tb_ctx.nb_tbs : 0, --=20 2.9.4