From nobody Mon Feb 9 03:44:41 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; dkim=fail spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 14950329866811023.508296739125; Wed, 17 May 2017 07:56:26 -0700 (PDT) Received: from localhost ([::1]:49394 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dB0NF-0004uB-B1 for importer@patchew.org; Wed, 17 May 2017 10:56:25 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50772) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dB0JA-0001nb-Qe for qemu-devel@nongnu.org; Wed, 17 May 2017 10:52:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dB0J9-0004gL-8V for qemu-devel@nongnu.org; Wed, 17 May 2017 10:52:12 -0400 Received: from mail-wm0-x22e.google.com ([2a00:1450:400c:c09::22e]:36130) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dB0J9-0004fZ-0P for qemu-devel@nongnu.org; Wed, 17 May 2017 10:52:11 -0400 Received: by mail-wm0-x22e.google.com with SMTP id 70so13262455wmq.1 for ; Wed, 17 May 2017 07:52:10 -0700 (PDT) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id 2sm19188660wmk.20.2017.05.17.07.52.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 May 2017 07:52:08 -0700 (PDT) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id D7C7D3E08CD; Wed, 17 May 2017 15:52:59 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fkx3DolIxITVCWhGdxcm2J15pc5f5bdwpaYpklZZI14=; b=YSLZQo8SAxhOYWvsuP/KzyKCZ5FChOUcC+9vN0Q78EEzY+MEl8oe4hw+Gxw0/hg5pd T1c5gkykGddw5UBMnWZ132+TlXR7uDKY3c/hmDsFHxMNnSWPP9aHFKdjsHu484fwiJDv qM/xg4uYDgVDH1AEq8Ka/v6g+Drpkovv+VNTw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fkx3DolIxITVCWhGdxcm2J15pc5f5bdwpaYpklZZI14=; b=de9PKCvZ/tv+zzJChhGYLRSTvN3mAHHqaLPEUoZJfuaM1VG+G5WnU0YMYn5EW43BBW doVcGATUzDG2zCEbMLo8g0+fpJczWOEo23RnDgkvpwkfz0NDIqTwnytLYChQCvLC+gn+ 1ajq4vlM4buqq/W+A/5H6rIBcyzGJMGs4y5ITawz/mcSfUgcZuF8RAQWb1mYZ3PCYt/T ZyMeex/pWquyUB+lhUUTuSULA9Rqw3PEV/UlhuVAb+itvpYV07wnY7RpUbYBl6D4Ciyx bVvIZJmzRDz5Zyg7P0Zo1dbn49zGGNvkNXbajwOzHd08hbq55kXJ8XB+fm92vqF6nauH lm1g== X-Gm-Message-State: AODbwcDti0n0mUvCjdMUoiExAHjkHl+QQrxnx8ezTJbh7vCTEgJcBBHc t2PiBGuzeboEX56y X-Received: by 10.28.24.207 with SMTP id 198mr6585004wmy.86.1495032729823; Wed, 17 May 2017 07:52:09 -0700 (PDT) From: =?UTF-8?q?Alex=20Benn=C3=A9e?= To: pbonzini@redhat.com, stefanha@redhat.com Date: Wed, 17 May 2017 15:52:58 +0100 Message-Id: <20170517145259.28979-6-alex.bennee@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170517145259.28979-1-alex.bennee@linaro.org> References: <20170517145259.28979-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::22e Subject: [Qemu-devel] [PATCH v2 5/6] cputlb: add trace events X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson , cota@braap.org, =?UTF-8?q?Alex=20Benn=C3=A9e?= , qemu-devel@nongnu.org, Peter Crosthwaite Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Given the range of costs for various SoftMMU TLB operations from deferring work for the currently running vCPU to bring the whole emulated machine to a stop for synchronised updates simple counters are less useful. Instead we log events via the trace infrastructure and we can then post-process the data in a range of ways. tlb_flush_self - the vCPU flushed its own TLB tlb_flush_async_schedule - work was scheduled and the vCPU kicked tlb_flush_synced_schedule - exclusive work was scheduled on a vCPU tlb_flush_work - scheduled work was done We can use the difference between the work being scheduled and tlb_flush_work to calculate the latency introduced. Signed-off-by: Alex Benn=C3=A9e Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- cputlb.c | 34 +++++++++++++++++++++++++++++++++- trace-events | 7 +++++++ 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/cputlb.c b/cputlb.c index d1859c3f37..b2490863e4 100644 --- a/cputlb.c +++ b/cputlb.c @@ -127,6 +127,7 @@ static void tlb_flush_nocheck(CPUState *cpu) =20 static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data dat= a) { + trace_tlb_flush_work(__func__, cpu->cpu_index); tlb_flush_nocheck(cpu); } =20 @@ -135,17 +136,22 @@ void tlb_flush(CPUState *cpu) if (cpu->created && !qemu_cpu_is_self(cpu)) { if (atomic_mb_read(&cpu->pending_tlb_flush) !=3D ALL_MMUIDX_BITS) { atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS); + trace_tlb_flush_async_schedule(__func__, current_cpu ? + current_cpu->cpu_index : + cpu->cpu_index, cpu->cpu_index); async_run_on_cpu(cpu, tlb_flush_global_async_work, RUN_ON_CPU_NULL); } } else { - tlb_flush_nocheck(cpu); + trace_tlb_flush_self(__func__, cpu->cpu_index); + tlb_flush_global_async_work(cpu, RUN_ON_CPU_NULL); } } =20 void tlb_flush_all_cpus(CPUState *src_cpu) { const run_on_cpu_func fn =3D tlb_flush_global_async_work; + trace_tlb_flush_async_schedule(__func__, src_cpu->cpu_index, -1); flush_all_helper(src_cpu, fn, RUN_ON_CPU_NULL); fn(src_cpu, RUN_ON_CPU_NULL); } @@ -153,6 +159,7 @@ void tlb_flush_all_cpus(CPUState *src_cpu) void tlb_flush_all_cpus_synced(CPUState *src_cpu) { const run_on_cpu_func fn =3D tlb_flush_global_async_work; + trace_tlb_flush_synced_schedule(__func__, src_cpu->cpu_index, -1); flush_all_helper(src_cpu, fn, RUN_ON_CPU_NULL); async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_NULL); } @@ -163,6 +170,8 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cp= u, run_on_cpu_data data) unsigned long mmu_idx_bitmask =3D data.host_int; int mmu_idx; =20 + trace_tlb_flush_work(__func__, cpu->cpu_index); + assert_cpu_is_self(cpu); =20 tb_lock(); @@ -196,12 +205,16 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxm= ap) =20 if (pending_flushes) { tlb_debug("reduced mmu_idx: 0x%" PRIx16 "\n", pending_flushes); + trace_tlb_flush_async_schedule(__func__, + current_cpu->cpu_index, + cpu->cpu_index); =20 atomic_or(&cpu->pending_tlb_flush, pending_flushes); async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work, RUN_ON_CPU_HOST_INT(pending_flushes)); } } else { + trace_tlb_flush_self(__func__, cpu->cpu_index); tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); } @@ -212,6 +225,7 @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *src_cpu, ui= nt16_t idxmap) const run_on_cpu_func fn =3D tlb_flush_by_mmuidx_async_work; =20 tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap); + trace_tlb_flush_async_schedule(__func__, src_cpu->cpu_index, -1); =20 flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap)); fn(src_cpu, RUN_ON_CPU_HOST_INT(idxmap)); @@ -223,6 +237,7 @@ void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *src_= cpu, const run_on_cpu_func fn =3D tlb_flush_by_mmuidx_async_work; =20 tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap); + trace_tlb_flush_synced_schedule(__func__, src_cpu->cpu_index, -1); =20 flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap)); async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap)); @@ -252,6 +267,7 @@ static void tlb_flush_page_async_work(CPUState *cpu, ru= n_on_cpu_data data) assert_cpu_is_self(cpu); =20 tlb_debug("page :" TARGET_FMT_lx "\n", addr); + trace_tlb_flush_work(__func__, cpu->cpu_index); =20 /* Check if we need to flush due to large pages. */ if ((addr & env->tlb_flush_mask) =3D=3D env->tlb_flush_addr) { @@ -285,9 +301,12 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) tlb_debug("page :" TARGET_FMT_lx "\n", addr); =20 if (!qemu_cpu_is_self(cpu)) { + trace_tlb_flush_async_schedule(__func__, current_cpu->cpu_index, + cpu->cpu_index); async_run_on_cpu(cpu, tlb_flush_page_async_work, RUN_ON_CPU_TARGET_PTR(addr)); } else { + trace_tlb_flush_self(__func__, cpu->cpu_index); tlb_flush_page_async_work(cpu, RUN_ON_CPU_TARGET_PTR(addr)); } } @@ -312,6 +331,7 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUStat= e *cpu, =20 tlb_debug("page:%d addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n", page, addr, mmu_idx_bitmap); + trace_tlb_flush_work(__func__, cpu->cpu_index); =20 for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { if (test_bit(mmu_idx, &mmu_idx_bitmap)) { @@ -336,6 +356,7 @@ static void tlb_check_page_and_flush_by_mmuidx_async_wo= rk(CPUState *cpu, unsigned long mmu_idx_bitmap =3D addr_and_mmuidx & ALL_MMUIDX_BITS; =20 tlb_debug("addr:"TARGET_FMT_lx" mmu_idx: %04lx\n", addr, mmu_idx_bitma= p); + trace_tlb_flush_work(__func__, cpu->cpu_index); =20 /* Check if we need to flush due to large pages. */ if ((addr & env->tlb_flush_mask) =3D=3D env->tlb_flush_addr) { @@ -343,9 +364,11 @@ static void tlb_check_page_and_flush_by_mmuidx_async_w= ork(CPUState *cpu, TARGET_FMT_lx "/" TARGET_FMT_lx ")\n", env->tlb_flush_addr, env->tlb_flush_mask); =20 + trace_tlb_flush_self(__func__, cpu->cpu_index); tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(mmu_idx_bitmap)= ); } else { + trace_tlb_flush_self(__func__, cpu->cpu_index); tlb_flush_page_by_mmuidx_async_work(cpu, data); } } @@ -361,9 +384,12 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ul= ong addr, uint16_t idxmap) addr_and_mmu_idx |=3D idxmap; =20 if (!qemu_cpu_is_self(cpu)) { + trace_tlb_flush_async_schedule(__func__, current_cpu->cpu_index, + cpu->cpu_index); async_run_on_cpu(cpu, tlb_check_page_and_flush_by_mmuidx_async_wor= k, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); } else { + trace_tlb_flush_self(__func__, cpu->cpu_index); tlb_check_page_and_flush_by_mmuidx_async_work( cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); } @@ -376,6 +402,7 @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cp= u, target_ulong addr, target_ulong addr_and_mmu_idx; =20 tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap); + trace_tlb_flush_async_schedule(__func__, src_cpu->cpu_index, -1); =20 /* This should already be page aligned */ addr_and_mmu_idx =3D addr & TARGET_PAGE_MASK; @@ -393,6 +420,7 @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState = *src_cpu, target_ulong addr_and_mmu_idx; =20 tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap); + trace_tlb_flush_synced_schedule(__func__, src_cpu->cpu_index, -1); =20 /* This should already be page aligned */ addr_and_mmu_idx =3D addr & TARGET_PAGE_MASK; @@ -406,6 +434,8 @@ void tlb_flush_page_all_cpus(CPUState *src, target_ulon= g addr) { const run_on_cpu_func fn =3D tlb_flush_page_async_work; =20 + trace_tlb_flush_async_schedule(__func__, src->cpu_index, -1); + flush_all_helper(src, fn, RUN_ON_CPU_TARGET_PTR(addr)); fn(src, RUN_ON_CPU_TARGET_PTR(addr)); } @@ -415,6 +445,8 @@ void tlb_flush_page_all_cpus_synced(CPUState *src, { const run_on_cpu_func fn =3D tlb_flush_page_async_work; =20 + trace_tlb_flush_synced_schedule(__func__, src->cpu_index, -1); + flush_all_helper(src, fn, RUN_ON_CPU_TARGET_PTR(addr)); async_safe_run_on_cpu(src, fn, RUN_ON_CPU_TARGET_PTR(addr)); } diff --git a/trace-events b/trace-events index e582d6315d..500f84b8c4 100644 --- a/trace-events +++ b/trace-events @@ -78,6 +78,13 @@ disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=3D0x%"= PRIxPTR disable exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=3D0x%"PRIxPTR disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=3D%x" =20 +# SoftMMU TLB flushes +# - setting to=3D-1 indicates work scheduled on all vCPUs +tlb_flush_self(const char *fn, int vcpu) "%s: cpu %d" +tlb_flush_async_schedule(const char *fn, int from, int to) "%s: from_cpu= =3D%d to_cpu=3D%d" +tlb_flush_synced_schedule(const char *fn, int from, int to) "%s: from_cpu= =3D%d to_cpu=3D%d" +tlb_flush_work(const char *fn, int vcpu) "%s: cpu %d" + # translate-all.c translate_block(void *tb, uintptr_t pc, uint8_t *tb_code) "tb:%p, pc:0x%"P= RIxPTR", tb_code:%p" =20 --=20 2.11.0