From nobody Mon Feb  9 03:44:41 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zoho.com;
	dkim=fail
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 14950329866811023.508296739125;
 Wed, 17 May 2017 07:56:26 -0700 (PDT)
Received: from localhost ([::1]:49394 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1dB0NF-0004uB-B1
	for importer@patchew.org; Wed, 17 May 2017 10:56:25 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50772)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1dB0JA-0001nb-Qe
	for qemu-devel@nongnu.org; Wed, 17 May 2017 10:52:14 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1dB0J9-0004gL-8V
	for qemu-devel@nongnu.org; Wed, 17 May 2017 10:52:12 -0400
Received: from mail-wm0-x22e.google.com ([2a00:1450:400c:c09::22e]:36130)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <alex.bennee@linaro.org>)
	id 1dB0J9-0004fZ-0P
	for qemu-devel@nongnu.org; Wed, 17 May 2017 10:52:11 -0400
Received: by mail-wm0-x22e.google.com with SMTP id 70so13262455wmq.1
	for <qemu-devel@nongnu.org>; Wed, 17 May 2017 07:52:10 -0700 (PDT)
Received: from zen.linaro.local ([81.128.185.34])
	by smtp.gmail.com with ESMTPSA id 2sm19188660wmk.20.2017.05.17.07.52.04
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 17 May 2017 07:52:08 -0700 (PDT)
Received: from zen.linaroharston (localhost [127.0.0.1])
	by zen.linaro.local (Postfix) with ESMTP id D7C7D3E08CD;
	Wed, 17 May 2017 15:52:59 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references
	:mime-version:content-transfer-encoding;
	bh=fkx3DolIxITVCWhGdxcm2J15pc5f5bdwpaYpklZZI14=;
	b=YSLZQo8SAxhOYWvsuP/KzyKCZ5FChOUcC+9vN0Q78EEzY+MEl8oe4hw+Gxw0/hg5pd
	T1c5gkykGddw5UBMnWZ132+TlXR7uDKY3c/hmDsFHxMNnSWPP9aHFKdjsHu484fwiJDv
	qM/xg4uYDgVDH1AEq8Ka/v6g+Drpkovv+VNTw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references:mime-version:content-transfer-encoding;
	bh=fkx3DolIxITVCWhGdxcm2J15pc5f5bdwpaYpklZZI14=;
	b=de9PKCvZ/tv+zzJChhGYLRSTvN3mAHHqaLPEUoZJfuaM1VG+G5WnU0YMYn5EW43BBW
	doVcGATUzDG2zCEbMLo8g0+fpJczWOEo23RnDgkvpwkfz0NDIqTwnytLYChQCvLC+gn+
	1ajq4vlM4buqq/W+A/5H6rIBcyzGJMGs4y5ITawz/mcSfUgcZuF8RAQWb1mYZ3PCYt/T
	ZyMeex/pWquyUB+lhUUTuSULA9Rqw3PEV/UlhuVAb+itvpYV07wnY7RpUbYBl6D4Ciyx
	bVvIZJmzRDz5Zyg7P0Zo1dbn49zGGNvkNXbajwOzHd08hbq55kXJ8XB+fm92vqF6nauH
	lm1g==
X-Gm-Message-State: AODbwcDti0n0mUvCjdMUoiExAHjkHl+QQrxnx8ezTJbh7vCTEgJcBBHc
	t2PiBGuzeboEX56y
X-Received: by 10.28.24.207 with SMTP id 198mr6585004wmy.86.1495032729823;
	Wed, 17 May 2017 07:52:09 -0700 (PDT)
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>
To: pbonzini@redhat.com,
	stefanha@redhat.com
Date: Wed, 17 May 2017 15:52:58 +0100
Message-Id: <20170517145259.28979-6-alex.bennee@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20170517145259.28979-1-alex.bennee@linaro.org>
References: <20170517145259.28979-1-alex.bennee@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2a00:1450:400c:c09::22e
Subject: [Qemu-devel] [PATCH v2 5/6] cputlb: add trace events
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Richard Henderson <rth@twiddle.net>, cota@braap.org,
	=?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>,
	qemu-devel@nongnu.org, Peter Crosthwaite <crosthwaite.peter@gmail.com>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZohoMail: RDKM_2  RSF_0  Z_629925259 SPT_0

Given the range of costs for various SoftMMU TLB operations from
deferring work for the currently running vCPU to bring the whole
emulated machine to a stop for synchronised updates simple counters
are less useful. Instead we log events via the trace infrastructure
and we can then post-process the data in a range of ways.

  tlb_flush_self - the vCPU flushed its own TLB
  tlb_flush_async_schedule - work was scheduled and the vCPU kicked
  tlb_flush_synced_schedule - exclusive work was scheduled on a vCPU
  tlb_flush_work - scheduled work was done

We can use the difference between the work being scheduled and
tlb_flush_work to calculate the latency introduced.

Signed-off-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
---
 cputlb.c     | 34 +++++++++++++++++++++++++++++++++-
 trace-events |  7 +++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/cputlb.c b/cputlb.c
index d1859c3f37..b2490863e4 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -127,6 +127,7 @@ static void tlb_flush_nocheck(CPUState *cpu)
=20
 static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data dat=
a)
 {
+    trace_tlb_flush_work(__func__, cpu->cpu_index);
     tlb_flush_nocheck(cpu);
 }
=20
@@ -135,17 +136,22 @@ void tlb_flush(CPUState *cpu)
     if (cpu->created && !qemu_cpu_is_self(cpu)) {
         if (atomic_mb_read(&cpu->pending_tlb_flush) !=3D ALL_MMUIDX_BITS) {
             atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS);
+            trace_tlb_flush_async_schedule(__func__, current_cpu ?
+                                           current_cpu->cpu_index :
+                                           cpu->cpu_index, cpu->cpu_index);
             async_run_on_cpu(cpu, tlb_flush_global_async_work,
                              RUN_ON_CPU_NULL);
         }
     } else {
-        tlb_flush_nocheck(cpu);
+        trace_tlb_flush_self(__func__, cpu->cpu_index);
+        tlb_flush_global_async_work(cpu, RUN_ON_CPU_NULL);
     }
 }
=20
 void tlb_flush_all_cpus(CPUState *src_cpu)
 {
     const run_on_cpu_func fn =3D tlb_flush_global_async_work;
+    trace_tlb_flush_async_schedule(__func__, src_cpu->cpu_index, -1);
     flush_all_helper(src_cpu, fn, RUN_ON_CPU_NULL);
     fn(src_cpu, RUN_ON_CPU_NULL);
 }
@@ -153,6 +159,7 @@ void tlb_flush_all_cpus(CPUState *src_cpu)
 void tlb_flush_all_cpus_synced(CPUState *src_cpu)
 {
     const run_on_cpu_func fn =3D tlb_flush_global_async_work;
+    trace_tlb_flush_synced_schedule(__func__, src_cpu->cpu_index, -1);
     flush_all_helper(src_cpu, fn, RUN_ON_CPU_NULL);
     async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_NULL);
 }
@@ -163,6 +170,8 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cp=
u, run_on_cpu_data data)
     unsigned long mmu_idx_bitmask =3D data.host_int;
     int mmu_idx;
=20
+    trace_tlb_flush_work(__func__, cpu->cpu_index);
+
     assert_cpu_is_self(cpu);
=20
     tb_lock();
@@ -196,12 +205,16 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxm=
ap)
=20
         if (pending_flushes) {
             tlb_debug("reduced mmu_idx: 0x%" PRIx16 "\n", pending_flushes);
+            trace_tlb_flush_async_schedule(__func__,
+                                           current_cpu->cpu_index,
+                                           cpu->cpu_index);
=20
             atomic_or(&cpu->pending_tlb_flush, pending_flushes);
             async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work,
                              RUN_ON_CPU_HOST_INT(pending_flushes));
         }
     } else {
+        trace_tlb_flush_self(__func__, cpu->cpu_index);
         tlb_flush_by_mmuidx_async_work(cpu,
                                        RUN_ON_CPU_HOST_INT(idxmap));
     }
@@ -212,6 +225,7 @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *src_cpu, ui=
nt16_t idxmap)
     const run_on_cpu_func fn =3D tlb_flush_by_mmuidx_async_work;
=20
     tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
+    trace_tlb_flush_async_schedule(__func__, src_cpu->cpu_index, -1);
=20
     flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
     fn(src_cpu, RUN_ON_CPU_HOST_INT(idxmap));
@@ -223,6 +237,7 @@ void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *src_=
cpu,
     const run_on_cpu_func fn =3D tlb_flush_by_mmuidx_async_work;
=20
     tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
+    trace_tlb_flush_synced_schedule(__func__, src_cpu->cpu_index, -1);
=20
     flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
     async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
@@ -252,6 +267,7 @@ static void tlb_flush_page_async_work(CPUState *cpu, ru=
n_on_cpu_data data)
     assert_cpu_is_self(cpu);
=20
     tlb_debug("page :" TARGET_FMT_lx "\n", addr);
+    trace_tlb_flush_work(__func__, cpu->cpu_index);
=20
     /* Check if we need to flush due to large pages.  */
     if ((addr & env->tlb_flush_mask) =3D=3D env->tlb_flush_addr) {
@@ -285,9 +301,12 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
     tlb_debug("page :" TARGET_FMT_lx "\n", addr);
=20
     if (!qemu_cpu_is_self(cpu)) {
+        trace_tlb_flush_async_schedule(__func__, current_cpu->cpu_index,
+                                       cpu->cpu_index);
         async_run_on_cpu(cpu, tlb_flush_page_async_work,
                          RUN_ON_CPU_TARGET_PTR(addr));
     } else {
+        trace_tlb_flush_self(__func__, cpu->cpu_index);
         tlb_flush_page_async_work(cpu, RUN_ON_CPU_TARGET_PTR(addr));
     }
 }
@@ -312,6 +331,7 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUStat=
e *cpu,
=20
     tlb_debug("page:%d addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n",
               page, addr, mmu_idx_bitmap);
+    trace_tlb_flush_work(__func__, cpu->cpu_index);
=20
     for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
@@ -336,6 +356,7 @@ static void tlb_check_page_and_flush_by_mmuidx_async_wo=
rk(CPUState *cpu,
     unsigned long mmu_idx_bitmap =3D addr_and_mmuidx & ALL_MMUIDX_BITS;
=20
     tlb_debug("addr:"TARGET_FMT_lx" mmu_idx: %04lx\n", addr, mmu_idx_bitma=
p);
+    trace_tlb_flush_work(__func__, cpu->cpu_index);
=20
     /* Check if we need to flush due to large pages.  */
     if ((addr & env->tlb_flush_mask) =3D=3D env->tlb_flush_addr) {
@@ -343,9 +364,11 @@ static void tlb_check_page_and_flush_by_mmuidx_async_w=
ork(CPUState *cpu,
                   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
                   env->tlb_flush_addr, env->tlb_flush_mask);
=20
+        trace_tlb_flush_self(__func__, cpu->cpu_index);
         tlb_flush_by_mmuidx_async_work(cpu,
                                        RUN_ON_CPU_HOST_INT(mmu_idx_bitmap)=
);
     } else {
+        trace_tlb_flush_self(__func__, cpu->cpu_index);
         tlb_flush_page_by_mmuidx_async_work(cpu, data);
     }
 }
@@ -361,9 +384,12 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ul=
ong addr, uint16_t idxmap)
     addr_and_mmu_idx |=3D idxmap;
=20
     if (!qemu_cpu_is_self(cpu)) {
+        trace_tlb_flush_async_schedule(__func__, current_cpu->cpu_index,
+                                       cpu->cpu_index);
         async_run_on_cpu(cpu, tlb_check_page_and_flush_by_mmuidx_async_wor=
k,
                          RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
     } else {
+        trace_tlb_flush_self(__func__, cpu->cpu_index);
         tlb_check_page_and_flush_by_mmuidx_async_work(
             cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
     }
@@ -376,6 +402,7 @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cp=
u, target_ulong addr,
     target_ulong addr_and_mmu_idx;
=20
     tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
+    trace_tlb_flush_async_schedule(__func__, src_cpu->cpu_index, -1);
=20
     /* This should already be page aligned */
     addr_and_mmu_idx =3D addr & TARGET_PAGE_MASK;
@@ -393,6 +420,7 @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState =
*src_cpu,
     target_ulong addr_and_mmu_idx;
=20
     tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
+    trace_tlb_flush_synced_schedule(__func__, src_cpu->cpu_index, -1);
=20
     /* This should already be page aligned */
     addr_and_mmu_idx =3D addr & TARGET_PAGE_MASK;
@@ -406,6 +434,8 @@ void tlb_flush_page_all_cpus(CPUState *src, target_ulon=
g addr)
 {
     const run_on_cpu_func fn =3D tlb_flush_page_async_work;
=20
+    trace_tlb_flush_async_schedule(__func__, src->cpu_index, -1);
+
     flush_all_helper(src, fn, RUN_ON_CPU_TARGET_PTR(addr));
     fn(src, RUN_ON_CPU_TARGET_PTR(addr));
 }
@@ -415,6 +445,8 @@ void tlb_flush_page_all_cpus_synced(CPUState *src,
 {
     const run_on_cpu_func fn =3D tlb_flush_page_async_work;
=20
+    trace_tlb_flush_synced_schedule(__func__, src->cpu_index, -1);
+
     flush_all_helper(src, fn, RUN_ON_CPU_TARGET_PTR(addr));
     async_safe_run_on_cpu(src, fn, RUN_ON_CPU_TARGET_PTR(addr));
 }
diff --git a/trace-events b/trace-events
index e582d6315d..500f84b8c4 100644
--- a/trace-events
+++ b/trace-events
@@ -78,6 +78,13 @@ disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=3D0x%"=
PRIxPTR
 disable exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=3D0x%"PRIxPTR
 disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=3D%x"
=20
+# SoftMMU TLB flushes
+#   - setting to=3D-1 indicates work scheduled on all vCPUs
+tlb_flush_self(const char *fn, int vcpu) "%s: cpu %d"
+tlb_flush_async_schedule(const char *fn, int from, int to) "%s: from_cpu=
=3D%d to_cpu=3D%d"
+tlb_flush_synced_schedule(const char *fn, int from, int to) "%s: from_cpu=
=3D%d to_cpu=3D%d"
+tlb_flush_work(const char *fn, int vcpu) "%s: cpu %d"
+
 # translate-all.c
 translate_block(void *tb, uintptr_t pc, uint8_t *tb_code) "tb:%p, pc:0x%"P=
RIxPTR", tb_code:%p"
=20
--=20
2.11.0