1
The following changes since commit 3e08b2b9cb64bff2b73fa9128c0e49bfcde0dd40:
1
This is mostly my code_gen_buffer cleanup, plus a few other random
2
changes thrown in. Including a fix for a recent float32_exp2 bug.
2
3
3
Merge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20200121' into staging (2020-01-21 15:29:25 +0000)
4
5
r~
6
7
8
The following changes since commit 894fc4fd670aaf04a67dc7507739f914ff4bacf2:
9
10
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging (2021-06-11 09:21:48 +0100)
4
11
5
are available in the Git repository at:
12
are available in the Git repository at:
6
13
7
https://github.com/rth7680/qemu.git tags/pull-tcg-20200121
14
https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210611
8
15
9
for you to fetch changes up to 75fa376cdab5e5db2c7fdd107358e16f95503ac6:
16
for you to fetch changes up to 60afaddc208d34f6dc86dd974f6e02724fba6eb6:
10
17
11
scripts/git.orderfile: Display decodetree before C source (2020-01-21 15:26:09 -1000)
18
docs/devel: Explain in more detail the TB chaining mechanisms (2021-06-11 09:41:25 -0700)
12
19
13
----------------------------------------------------------------
20
----------------------------------------------------------------
14
Remove another limit to NB_MMU_MODES.
21
Clean up code_gen_buffer allocation.
15
Fix compilation using uclibc.
22
Add tcg_remove_ops_after.
16
Fix defaulting of -accel parameters.
23
Fix tcg_constant_* documentation.
17
Tidy cputlb basic routines.
24
Improve TB chaining documentation.
18
Adjust git.orderfile for decodetree.
25
Fix float32_exp2.
19
26
20
----------------------------------------------------------------
27
----------------------------------------------------------------
21
Carlos Santos (1):
28
Jose R. Ziviani (1):
22
util/cacheinfo: fix crash when compiling with uClibc
29
tcg/arm: Fix tcg_out_op function signature
23
30
24
Philippe Mathieu-Daudé (1):
31
Luis Pires (1):
25
scripts/git.orderfile: Display decodetree before C source
32
docs/devel: Explain in more detail the TB chaining mechanisms
26
33
27
Richard Henderson (14):
34
Richard Henderson (32):
28
cputlb: Handle NB_MMU_MODES > TARGET_PAGE_BITS_MIN
35
meson: Split out tcg/meson.build
29
vl: Remove unused variable in configure_accelerators
36
meson: Split out fpu/meson.build
30
vl: Reduce scope of variables in configure_accelerators
37
tcg: Re-order tcg_region_init vs tcg_prologue_init
31
vl: Remove useless test in configure_accelerators
38
tcg: Remove error return from tcg_region_initial_alloc__locked
32
vl: Only choose enabled accelerators in configure_accelerators
39
tcg: Split out tcg_region_initial_alloc
33
cputlb: Merge tlb_table_flush_by_mmuidx into tlb_flush_one_mmuidx_locked
40
tcg: Split out tcg_region_prologue_set
34
cputlb: Make tlb_n_entries private to cputlb.c
41
tcg: Split out region.c
35
cputlb: Pass CPUTLBDescFast to tlb_n_entries and sizeof_tlb
42
accel/tcg: Inline cpu_gen_init
36
cputlb: Hoist tlb portions in tlb_mmu_resize_locked
43
accel/tcg: Move alloc_code_gen_buffer to tcg/region.c
37
cputlb: Hoist tlb portions in tlb_flush_one_mmuidx_locked
44
accel/tcg: Rename tcg_init to tcg_init_machine
38
cputlb: Split out tlb_mmu_flush_locked
45
tcg: Create tcg_init
39
cputlb: Partially merge tlb_dyn_init into tlb_init
46
accel/tcg: Merge tcg_exec_init into tcg_init_machine
40
cputlb: Initialize tlbs as flushed
47
accel/tcg: Use MiB in tcg_init_machine
41
cputlb: Hoist timestamp outside of loops over tlbs
48
accel/tcg: Pass down max_cpus to tcg_init
49
tcg: Introduce tcg_max_ctxs
50
tcg: Move MAX_CODE_GEN_BUFFER_SIZE to tcg-target.h
51
tcg: Replace region.end with region.total_size
52
tcg: Rename region.start to region.after_prologue
53
tcg: Tidy tcg_n_regions
54
tcg: Tidy split_cross_256mb
55
tcg: Move in_code_gen_buffer and tests to region.c
56
tcg: Allocate code_gen_buffer into struct tcg_region_state
57
tcg: Return the map protection from alloc_code_gen_buffer
58
tcg: Sink qemu_madvise call to common code
59
util/osdep: Add qemu_mprotect_rw
60
tcg: Round the tb_size default from qemu_get_host_physmem
61
tcg: Merge buffer protection and guard page protection
62
tcg: When allocating for !splitwx, begin with PROT_NONE
63
tcg: Move tcg_init_ctx and tcg_ctx from accel/tcg/
64
tcg: Introduce tcg_remove_ops_after
65
tcg: Fix documentation for tcg_constant_* vs tcg_temp_free_*
66
softfloat: Fix tp init in float32_exp2
42
67
43
include/exec/cpu_ldst.h | 5 -
68
docs/devel/tcg.rst | 101 ++++-
44
accel/tcg/cputlb.c | 287 +++++++++++++++++++++++++++++++++---------------
69
meson.build | 12 +-
45
util/cacheinfo.c | 10 +-
70
accel/tcg/internal.h | 2 +
46
vl.c | 27 +++--
71
include/qemu/osdep.h | 1 +
47
scripts/git.orderfile | 3 +
72
include/sysemu/tcg.h | 2 -
48
5 files changed, 223 insertions(+), 109 deletions(-)
73
include/tcg/tcg.h | 28 +-
74
tcg/aarch64/tcg-target.h | 1 +
75
tcg/arm/tcg-target.h | 1 +
76
tcg/i386/tcg-target.h | 2 +
77
tcg/mips/tcg-target.h | 6 +
78
tcg/ppc/tcg-target.h | 2 +
79
tcg/riscv/tcg-target.h | 1 +
80
tcg/s390/tcg-target.h | 3 +
81
tcg/sparc/tcg-target.h | 1 +
82
tcg/tcg-internal.h | 40 ++
83
tcg/tci/tcg-target.h | 1 +
84
accel/tcg/tcg-all.c | 32 +-
85
accel/tcg/translate-all.c | 439 +-------------------
86
bsd-user/main.c | 3 +-
87
fpu/softfloat.c | 2 +-
88
linux-user/main.c | 1 -
89
tcg/region.c | 999 ++++++++++++++++++++++++++++++++++++++++++++++
90
tcg/tcg.c | 649 +++---------------------------
91
util/osdep.c | 9 +
92
tcg/arm/tcg-target.c.inc | 3 +-
93
fpu/meson.build | 1 +
94
tcg/meson.build | 14 +
95
27 files changed, 1266 insertions(+), 1090 deletions(-)
96
create mode 100644 tcg/tcg-internal.h
97
create mode 100644 tcg/region.c
98
create mode 100644 fpu/meson.build
99
create mode 100644 tcg/meson.build
49
100
diff view generated by jsdifflib
New patch
1
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
2
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
3
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
---
6
meson.build | 8 +-------
7
tcg/meson.build | 13 +++++++++++++
8
2 files changed, 14 insertions(+), 7 deletions(-)
9
create mode 100644 tcg/meson.build
1
10
11
diff --git a/meson.build b/meson.build
12
index XXXXXXX..XXXXXXX 100644
13
--- a/meson.build
14
+++ b/meson.build
15
@@ -XXX,XX +XXX,XX @@ common_ss.add(capstone)
16
specific_ss.add(files('cpu.c', 'disas.c', 'gdbstub.c'), capstone)
17
specific_ss.add(when: 'CONFIG_TCG', if_true: files(
18
'fpu/softfloat.c',
19
- 'tcg/optimize.c',
20
- 'tcg/tcg-common.c',
21
- 'tcg/tcg-op-gvec.c',
22
- 'tcg/tcg-op-vec.c',
23
- 'tcg/tcg-op.c',
24
- 'tcg/tcg.c',
25
))
26
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
27
28
# Work around a gcc bug/misfeature wherein constant propagation looks
29
# through an alias:
30
@@ -XXX,XX +XXX,XX @@ subdir('net')
31
subdir('replay')
32
subdir('semihosting')
33
subdir('hw')
34
+subdir('tcg')
35
subdir('accel')
36
subdir('plugins')
37
subdir('bsd-user')
38
diff --git a/tcg/meson.build b/tcg/meson.build
39
new file mode 100644
40
index XXXXXXX..XXXXXXX
41
--- /dev/null
42
+++ b/tcg/meson.build
43
@@ -XXX,XX +XXX,XX @@
44
+tcg_ss = ss.source_set()
45
+
46
+tcg_ss.add(files(
47
+ 'optimize.c',
48
+ 'tcg.c',
49
+ 'tcg-common.c',
50
+ 'tcg-op.c',
51
+ 'tcg-op-gvec.c',
52
+ 'tcg-op-vec.c',
53
+))
54
+tcg_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tci.c'))
55
+
56
+specific_ss.add_all(when: 'CONFIG_TCG', if_true: tcg_ss)
57
--
58
2.25.1
59
60
diff view generated by jsdifflib
New patch
1
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
2
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
3
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
---
6
meson.build | 4 +---
7
fpu/meson.build | 1 +
8
2 files changed, 2 insertions(+), 3 deletions(-)
9
create mode 100644 fpu/meson.build
1
10
11
diff --git a/meson.build b/meson.build
12
index XXXXXXX..XXXXXXX 100644
13
--- a/meson.build
14
+++ b/meson.build
15
@@ -XXX,XX +XXX,XX @@ subdir('softmmu')
16
17
common_ss.add(capstone)
18
specific_ss.add(files('cpu.c', 'disas.c', 'gdbstub.c'), capstone)
19
-specific_ss.add(when: 'CONFIG_TCG', if_true: files(
20
- 'fpu/softfloat.c',
21
-))
22
23
# Work around a gcc bug/misfeature wherein constant propagation looks
24
# through an alias:
25
@@ -XXX,XX +XXX,XX @@ subdir('replay')
26
subdir('semihosting')
27
subdir('hw')
28
subdir('tcg')
29
+subdir('fpu')
30
subdir('accel')
31
subdir('plugins')
32
subdir('bsd-user')
33
diff --git a/fpu/meson.build b/fpu/meson.build
34
new file mode 100644
35
index XXXXXXX..XXXXXXX
36
--- /dev/null
37
+++ b/fpu/meson.build
38
@@ -0,0 +1 @@
39
+specific_ss.add(when: 'CONFIG_TCG', if_true: files('softfloat.c'))
40
--
41
2.25.1
42
43
diff view generated by jsdifflib
New patch
1
Instead of delaying tcg_region_init until after tcg_prologue_init
2
is complete, do tcg_region_init first and let tcg_prologue_init
3
shrink the first region by the size of the generated prologue.
1
4
5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
accel/tcg/tcg-all.c | 11 ---------
10
accel/tcg/translate-all.c | 3 +++
11
bsd-user/main.c | 1 -
12
linux-user/main.c | 1 -
13
tcg/tcg.c | 52 ++++++++++++++-------------------------
14
5 files changed, 22 insertions(+), 46 deletions(-)
15
16
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/accel/tcg/tcg-all.c
19
+++ b/accel/tcg/tcg-all.c
20
@@ -XXX,XX +XXX,XX @@ static int tcg_init(MachineState *ms)
21
22
tcg_exec_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
23
mttcg_enabled = s->mttcg_enabled;
24
-
25
- /*
26
- * Initialize TCG regions only for softmmu.
27
- *
28
- * This needs to be done later for user mode, because the prologue
29
- * generation needs to be delayed so that GUEST_BASE is already set.
30
- */
31
-#ifndef CONFIG_USER_ONLY
32
- tcg_region_init();
33
-#endif /* !CONFIG_USER_ONLY */
34
-
35
return 0;
36
}
37
38
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
39
index XXXXXXX..XXXXXXX 100644
40
--- a/accel/tcg/translate-all.c
41
+++ b/accel/tcg/translate-all.c
42
@@ -XXX,XX +XXX,XX @@ void tcg_exec_init(unsigned long tb_size, int splitwx)
43
splitwx, &error_fatal);
44
assert(ok);
45
46
+ /* TODO: allocating regions is hand-in-glove with code_gen_buffer. */
47
+ tcg_region_init();
48
+
49
#if defined(CONFIG_SOFTMMU)
50
/* There's no guest base to take into account, so go ahead and
51
initialize the prologue now. */
52
diff --git a/bsd-user/main.c b/bsd-user/main.c
53
index XXXXXXX..XXXXXXX 100644
54
--- a/bsd-user/main.c
55
+++ b/bsd-user/main.c
56
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
57
* the real value of GUEST_BASE into account.
58
*/
59
tcg_prologue_init(tcg_ctx);
60
- tcg_region_init();
61
62
/* build Task State */
63
memset(ts, 0, sizeof(TaskState));
64
diff --git a/linux-user/main.c b/linux-user/main.c
65
index XXXXXXX..XXXXXXX 100644
66
--- a/linux-user/main.c
67
+++ b/linux-user/main.c
68
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
69
generating the prologue until now so that the prologue can take
70
the real value of GUEST_BASE into account. */
71
tcg_prologue_init(tcg_ctx);
72
- tcg_region_init();
73
74
target_cpu_copy_regs(env, regs);
75
76
diff --git a/tcg/tcg.c b/tcg/tcg.c
77
index XXXXXXX..XXXXXXX 100644
78
--- a/tcg/tcg.c
79
+++ b/tcg/tcg.c
80
@@ -XXX,XX +XXX,XX @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
81
82
void tcg_prologue_init(TCGContext *s)
83
{
84
- size_t prologue_size, total_size;
85
- void *buf0, *buf1;
86
+ size_t prologue_size;
87
88
/* Put the prologue at the beginning of code_gen_buffer. */
89
- buf0 = s->code_gen_buffer;
90
- total_size = s->code_gen_buffer_size;
91
- s->code_ptr = buf0;
92
- s->code_buf = buf0;
93
+ tcg_region_assign(s, 0);
94
+ s->code_ptr = s->code_gen_ptr;
95
+ s->code_buf = s->code_gen_ptr;
96
s->data_gen_ptr = NULL;
97
98
- /*
99
- * The region trees are not yet configured, but tcg_splitwx_to_rx
100
- * needs the bounds for an assert.
101
- */
102
- region.start = buf0;
103
- region.end = buf0 + total_size;
104
-
105
#ifndef CONFIG_TCG_INTERPRETER
106
- tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_splitwx_to_rx(buf0);
107
+ tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_splitwx_to_rx(s->code_ptr);
108
#endif
109
110
- /* Compute a high-water mark, at which we voluntarily flush the buffer
111
- and start over. The size here is arbitrary, significantly larger
112
- than we expect the code generation for any one opcode to require. */
113
- s->code_gen_highwater = s->code_gen_buffer + (total_size - TCG_HIGHWATER);
114
-
115
#ifdef TCG_TARGET_NEED_POOL_LABELS
116
s->pool_labels = NULL;
117
#endif
118
@@ -XXX,XX +XXX,XX @@ void tcg_prologue_init(TCGContext *s)
119
}
120
#endif
121
122
- buf1 = s->code_ptr;
123
+ prologue_size = tcg_current_code_size(s);
124
+
125
#ifndef CONFIG_TCG_INTERPRETER
126
- flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(buf0), (uintptr_t)buf0,
127
- tcg_ptr_byte_diff(buf1, buf0));
128
+ flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(s->code_buf),
129
+ (uintptr_t)s->code_buf, prologue_size);
130
#endif
131
132
- /* Deduct the prologue from the buffer. */
133
- prologue_size = tcg_current_code_size(s);
134
- s->code_gen_ptr = buf1;
135
- s->code_gen_buffer = buf1;
136
- s->code_buf = buf1;
137
- total_size -= prologue_size;
138
- s->code_gen_buffer_size = total_size;
139
+ /* Deduct the prologue from the first region. */
140
+ region.start = s->code_ptr;
141
142
- tcg_register_jit(tcg_splitwx_to_rx(s->code_gen_buffer), total_size);
143
+ /* Recompute boundaries of the first region. */
144
+ tcg_region_assign(s, 0);
145
+
146
+ tcg_register_jit(tcg_splitwx_to_rx(region.start),
147
+ region.end - region.start);
148
149
#ifdef DEBUG_DISAS
150
if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM)) {
151
FILE *logfile = qemu_log_lock();
152
qemu_log("PROLOGUE: [size=%zu]\n", prologue_size);
153
if (s->data_gen_ptr) {
154
- size_t code_size = s->data_gen_ptr - buf0;
155
+ size_t code_size = s->data_gen_ptr - s->code_gen_ptr;
156
size_t data_size = prologue_size - code_size;
157
size_t i;
158
159
- log_disas(buf0, code_size);
160
+ log_disas(s->code_gen_ptr, code_size);
161
162
for (i = 0; i < data_size; i += sizeof(tcg_target_ulong)) {
163
if (sizeof(tcg_target_ulong) == 8) {
164
@@ -XXX,XX +XXX,XX @@ void tcg_prologue_init(TCGContext *s)
165
}
166
}
167
} else {
168
- log_disas(buf0, prologue_size);
169
+ log_disas(s->code_gen_ptr, prologue_size);
170
}
171
qemu_log("\n");
172
qemu_log_flush();
173
--
174
2.25.1
175
176
diff view generated by jsdifflib
New patch
1
All callers immediately assert on error, so move the assert
2
into the function itself.
1
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
tcg/tcg.c | 19 ++++++-------------
10
1 file changed, 6 insertions(+), 13 deletions(-)
11
12
diff --git a/tcg/tcg.c b/tcg/tcg.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/tcg/tcg.c
15
+++ b/tcg/tcg.c
16
@@ -XXX,XX +XXX,XX @@ static bool tcg_region_alloc(TCGContext *s)
17
* Perform a context's first region allocation.
18
* This function does _not_ increment region.agg_size_full.
19
*/
20
-static inline bool tcg_region_initial_alloc__locked(TCGContext *s)
21
+static void tcg_region_initial_alloc__locked(TCGContext *s)
22
{
23
- return tcg_region_alloc__locked(s);
24
+ bool err = tcg_region_alloc__locked(s);
25
+ g_assert(!err);
26
}
27
28
/* Call from a safe-work context */
29
@@ -XXX,XX +XXX,XX @@ void tcg_region_reset_all(void)
30
31
for (i = 0; i < n_ctxs; i++) {
32
TCGContext *s = qatomic_read(&tcg_ctxs[i]);
33
- bool err = tcg_region_initial_alloc__locked(s);
34
-
35
- g_assert(!err);
36
+ tcg_region_initial_alloc__locked(s);
37
}
38
qemu_mutex_unlock(&region.lock);
39
40
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(void)
41
42
/* In user-mode we support only one ctx, so do the initial allocation now */
43
#ifdef CONFIG_USER_ONLY
44
- {
45
- bool err = tcg_region_initial_alloc__locked(tcg_ctx);
46
-
47
- g_assert(!err);
48
- }
49
+ tcg_region_initial_alloc__locked(tcg_ctx);
50
#endif
51
}
52
53
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
54
MachineState *ms = MACHINE(qdev_get_machine());
55
TCGContext *s = g_malloc(sizeof(*s));
56
unsigned int i, n;
57
- bool err;
58
59
*s = tcg_init_ctx;
60
61
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
62
63
tcg_ctx = s;
64
qemu_mutex_lock(&region.lock);
65
- err = tcg_region_initial_alloc__locked(tcg_ctx);
66
- g_assert(!err);
67
+ tcg_region_initial_alloc__locked(s);
68
qemu_mutex_unlock(&region.lock);
69
}
70
#endif /* !CONFIG_USER_ONLY */
71
--
72
2.25.1
73
74
diff view generated by jsdifflib
New patch
1
This has only one user, and currently needs an ifdef,
2
but will make more sense after some code motion.
1
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
8
tcg/tcg.c | 13 ++++++++++---
9
1 file changed, 10 insertions(+), 3 deletions(-)
10
11
diff --git a/tcg/tcg.c b/tcg/tcg.c
12
index XXXXXXX..XXXXXXX 100644
13
--- a/tcg/tcg.c
14
+++ b/tcg/tcg.c
15
@@ -XXX,XX +XXX,XX @@ static void tcg_region_initial_alloc__locked(TCGContext *s)
16
g_assert(!err);
17
}
18
19
+#ifndef CONFIG_USER_ONLY
20
+static void tcg_region_initial_alloc(TCGContext *s)
21
+{
22
+ qemu_mutex_lock(&region.lock);
23
+ tcg_region_initial_alloc__locked(s);
24
+ qemu_mutex_unlock(&region.lock);
25
+}
26
+#endif
27
+
28
/* Call from a safe-work context */
29
void tcg_region_reset_all(void)
30
{
31
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
32
}
33
34
tcg_ctx = s;
35
- qemu_mutex_lock(&region.lock);
36
- tcg_region_initial_alloc__locked(s);
37
- qemu_mutex_unlock(&region.lock);
38
+ tcg_region_initial_alloc(s);
39
}
40
#endif /* !CONFIG_USER_ONLY */
41
42
--
43
2.25.1
44
45
diff view generated by jsdifflib
New patch
1
This has only one user, but will make more sense after some
2
code motion.
1
3
4
Always leave the tcg_init_ctx initialized to the first region,
5
in preparation for tcg_prologue_init(). This also requires
6
that we don't re-allocate the region for the first cpu, lest
7
we hit the assertion for total number of regions allocated .
8
9
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
10
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
11
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
12
---
13
tcg/tcg.c | 37 ++++++++++++++++++++++---------------
14
1 file changed, 22 insertions(+), 15 deletions(-)
15
16
diff --git a/tcg/tcg.c b/tcg/tcg.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/tcg/tcg.c
19
+++ b/tcg/tcg.c
20
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(void)
21
22
tcg_region_trees_init();
23
24
- /* In user-mode we support only one ctx, so do the initial allocation now */
25
-#ifdef CONFIG_USER_ONLY
26
- tcg_region_initial_alloc__locked(tcg_ctx);
27
-#endif
28
+ /*
29
+ * Leave the initial context initialized to the first region.
30
+ * This will be the context into which we generate the prologue.
31
+ * It is also the only context for CONFIG_USER_ONLY.
32
+ */
33
+ tcg_region_initial_alloc__locked(&tcg_init_ctx);
34
+}
35
+
36
+static void tcg_region_prologue_set(TCGContext *s)
37
+{
38
+ /* Deduct the prologue from the first region. */
39
+ g_assert(region.start == s->code_gen_buffer);
40
+ region.start = s->code_ptr;
41
+
42
+ /* Recompute boundaries of the first region. */
43
+ tcg_region_assign(s, 0);
44
+
45
+ /* Register the balance of the buffer with gdb. */
46
+ tcg_register_jit(tcg_splitwx_to_rx(region.start),
47
+ region.end - region.start);
48
}
49
50
#ifdef CONFIG_DEBUG_TCG
51
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
52
53
if (n > 0) {
54
alloc_tcg_plugin_context(s);
55
+ tcg_region_initial_alloc(s);
56
}
57
58
tcg_ctx = s;
59
- tcg_region_initial_alloc(s);
60
}
61
#endif /* !CONFIG_USER_ONLY */
62
63
@@ -XXX,XX +XXX,XX @@ void tcg_prologue_init(TCGContext *s)
64
{
65
size_t prologue_size;
66
67
- /* Put the prologue at the beginning of code_gen_buffer. */
68
- tcg_region_assign(s, 0);
69
s->code_ptr = s->code_gen_ptr;
70
s->code_buf = s->code_gen_ptr;
71
s->data_gen_ptr = NULL;
72
@@ -XXX,XX +XXX,XX @@ void tcg_prologue_init(TCGContext *s)
73
(uintptr_t)s->code_buf, prologue_size);
74
#endif
75
76
- /* Deduct the prologue from the first region. */
77
- region.start = s->code_ptr;
78
-
79
- /* Recompute boundaries of the first region. */
80
- tcg_region_assign(s, 0);
81
-
82
- tcg_register_jit(tcg_splitwx_to_rx(region.start),
83
- region.end - region.start);
84
+ tcg_region_prologue_set(s);
85
86
#ifdef DEBUG_DISAS
87
if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM)) {
88
--
89
2.25.1
90
91
diff view generated by jsdifflib
New patch
1
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
2
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
---
5
tcg/tcg-internal.h | 37 +++
6
tcg/region.c | 572 +++++++++++++++++++++++++++++++++++++++++++++
7
tcg/tcg.c | 547 +------------------------------------------
8
tcg/meson.build | 1 +
9
4 files changed, 613 insertions(+), 544 deletions(-)
10
create mode 100644 tcg/tcg-internal.h
11
create mode 100644 tcg/region.c
1
12
13
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
14
new file mode 100644
15
index XXXXXXX..XXXXXXX
16
--- /dev/null
17
+++ b/tcg/tcg-internal.h
18
@@ -XXX,XX +XXX,XX @@
19
+/*
20
+ * Internal declarations for Tiny Code Generator for QEMU
21
+ *
22
+ * Copyright (c) 2008 Fabrice Bellard
23
+ *
24
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
25
+ * of this software and associated documentation files (the "Software"), to deal
26
+ * in the Software without restriction, including without limitation the rights
27
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
28
+ * copies of the Software, and to permit persons to whom the Software is
29
+ * furnished to do so, subject to the following conditions:
30
+ *
31
+ * The above copyright notice and this permission notice shall be included in
32
+ * all copies or substantial portions of the Software.
33
+ *
34
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
35
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
36
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
37
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
38
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
39
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
40
+ * THE SOFTWARE.
41
+ */
42
+
43
+#ifndef TCG_INTERNAL_H
44
+#define TCG_INTERNAL_H 1
45
+
46
+#define TCG_HIGHWATER 1024
47
+
48
+extern TCGContext **tcg_ctxs;
49
+extern unsigned int n_tcg_ctxs;
50
+
51
+bool tcg_region_alloc(TCGContext *s);
52
+void tcg_region_initial_alloc(TCGContext *s);
53
+void tcg_region_prologue_set(TCGContext *s);
54
+
55
+#endif /* TCG_INTERNAL_H */
56
diff --git a/tcg/region.c b/tcg/region.c
57
new file mode 100644
58
index XXXXXXX..XXXXXXX
59
--- /dev/null
60
+++ b/tcg/region.c
61
@@ -XXX,XX +XXX,XX @@
62
+/*
63
+ * Memory region management for Tiny Code Generator for QEMU
64
+ *
65
+ * Copyright (c) 2008 Fabrice Bellard
66
+ *
67
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
68
+ * of this software and associated documentation files (the "Software"), to deal
69
+ * in the Software without restriction, including without limitation the rights
70
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
71
+ * copies of the Software, and to permit persons to whom the Software is
72
+ * furnished to do so, subject to the following conditions:
73
+ *
74
+ * The above copyright notice and this permission notice shall be included in
75
+ * all copies or substantial portions of the Software.
76
+ *
77
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
78
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
79
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
80
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
81
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
82
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
83
+ * THE SOFTWARE.
84
+ */
85
+
86
+#include "qemu/osdep.h"
87
+#include "exec/exec-all.h"
88
+#include "tcg/tcg.h"
89
+#if !defined(CONFIG_USER_ONLY)
90
+#include "hw/boards.h"
91
+#endif
92
+#include "tcg-internal.h"
93
+
94
+
95
+struct tcg_region_tree {
96
+ QemuMutex lock;
97
+ GTree *tree;
98
+ /* padding to avoid false sharing is computed at run-time */
99
+};
100
+
101
+/*
102
+ * We divide code_gen_buffer into equally-sized "regions" that TCG threads
103
+ * dynamically allocate from as demand dictates. Given appropriate region
104
+ * sizing, this minimizes flushes even when some TCG threads generate a lot
105
+ * more code than others.
106
+ */
107
+struct tcg_region_state {
108
+ QemuMutex lock;
109
+
110
+ /* fields set at init time */
111
+ void *start;
112
+ void *start_aligned;
113
+ void *end;
114
+ size_t n;
115
+ size_t size; /* size of one region */
116
+ size_t stride; /* .size + guard size */
117
+
118
+ /* fields protected by the lock */
119
+ size_t current; /* current region index */
120
+ size_t agg_size_full; /* aggregate size of full regions */
121
+};
122
+
123
+static struct tcg_region_state region;
124
+
125
+/*
126
+ * This is an array of struct tcg_region_tree's, with padding.
127
+ * We use void * to simplify the computation of region_trees[i]; each
128
+ * struct is found every tree_size bytes.
129
+ */
130
+static void *region_trees;
131
+static size_t tree_size;
132
+
133
+/* compare a pointer @ptr and a tb_tc @s */
134
+static int ptr_cmp_tb_tc(const void *ptr, const struct tb_tc *s)
135
+{
136
+ if (ptr >= s->ptr + s->size) {
137
+ return 1;
138
+ } else if (ptr < s->ptr) {
139
+ return -1;
140
+ }
141
+ return 0;
142
+}
143
+
144
+static gint tb_tc_cmp(gconstpointer ap, gconstpointer bp)
145
+{
146
+ const struct tb_tc *a = ap;
147
+ const struct tb_tc *b = bp;
148
+
149
+ /*
150
+ * When both sizes are set, we know this isn't a lookup.
151
+ * This is the most likely case: every TB must be inserted; lookups
152
+ * are a lot less frequent.
153
+ */
154
+ if (likely(a->size && b->size)) {
155
+ if (a->ptr > b->ptr) {
156
+ return 1;
157
+ } else if (a->ptr < b->ptr) {
158
+ return -1;
159
+ }
160
+ /* a->ptr == b->ptr should happen only on deletions */
161
+ g_assert(a->size == b->size);
162
+ return 0;
163
+ }
164
+ /*
165
+ * All lookups have either .size field set to 0.
166
+ * From the glib sources we see that @ap is always the lookup key. However
167
+ * the docs provide no guarantee, so we just mark this case as likely.
168
+ */
169
+ if (likely(a->size == 0)) {
170
+ return ptr_cmp_tb_tc(a->ptr, b);
171
+ }
172
+ return ptr_cmp_tb_tc(b->ptr, a);
173
+}
174
+
175
+static void tcg_region_trees_init(void)
176
+{
177
+ size_t i;
178
+
179
+ tree_size = ROUND_UP(sizeof(struct tcg_region_tree), qemu_dcache_linesize);
180
+ region_trees = qemu_memalign(qemu_dcache_linesize, region.n * tree_size);
181
+ for (i = 0; i < region.n; i++) {
182
+ struct tcg_region_tree *rt = region_trees + i * tree_size;
183
+
184
+ qemu_mutex_init(&rt->lock);
185
+ rt->tree = g_tree_new(tb_tc_cmp);
186
+ }
187
+}
188
+
189
+static struct tcg_region_tree *tc_ptr_to_region_tree(const void *p)
190
+{
191
+ size_t region_idx;
192
+
193
+ /*
194
+ * Like tcg_splitwx_to_rw, with no assert. The pc may come from
195
+ * a signal handler over which the caller has no control.
196
+ */
197
+ if (!in_code_gen_buffer(p)) {
198
+ p -= tcg_splitwx_diff;
199
+ if (!in_code_gen_buffer(p)) {
200
+ return NULL;
201
+ }
202
+ }
203
+
204
+ if (p < region.start_aligned) {
205
+ region_idx = 0;
206
+ } else {
207
+ ptrdiff_t offset = p - region.start_aligned;
208
+
209
+ if (offset > region.stride * (region.n - 1)) {
210
+ region_idx = region.n - 1;
211
+ } else {
212
+ region_idx = offset / region.stride;
213
+ }
214
+ }
215
+ return region_trees + region_idx * tree_size;
216
+}
217
+
218
+void tcg_tb_insert(TranslationBlock *tb)
219
+{
220
+ struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
221
+
222
+ g_assert(rt != NULL);
223
+ qemu_mutex_lock(&rt->lock);
224
+ g_tree_insert(rt->tree, &tb->tc, tb);
225
+ qemu_mutex_unlock(&rt->lock);
226
+}
227
+
228
+void tcg_tb_remove(TranslationBlock *tb)
229
+{
230
+ struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
231
+
232
+ g_assert(rt != NULL);
233
+ qemu_mutex_lock(&rt->lock);
234
+ g_tree_remove(rt->tree, &tb->tc);
235
+ qemu_mutex_unlock(&rt->lock);
236
+}
237
+
238
+/*
239
+ * Find the TB 'tb' such that
240
+ * tb->tc.ptr <= tc_ptr < tb->tc.ptr + tb->tc.size
241
+ * Return NULL if not found.
242
+ */
243
+TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr)
244
+{
245
+ struct tcg_region_tree *rt = tc_ptr_to_region_tree((void *)tc_ptr);
246
+ TranslationBlock *tb;
247
+ struct tb_tc s = { .ptr = (void *)tc_ptr };
248
+
249
+ if (rt == NULL) {
250
+ return NULL;
251
+ }
252
+
253
+ qemu_mutex_lock(&rt->lock);
254
+ tb = g_tree_lookup(rt->tree, &s);
255
+ qemu_mutex_unlock(&rt->lock);
256
+ return tb;
257
+}
258
+
259
+static void tcg_region_tree_lock_all(void)
260
+{
261
+ size_t i;
262
+
263
+ for (i = 0; i < region.n; i++) {
264
+ struct tcg_region_tree *rt = region_trees + i * tree_size;
265
+
266
+ qemu_mutex_lock(&rt->lock);
267
+ }
268
+}
269
+
270
+static void tcg_region_tree_unlock_all(void)
271
+{
272
+ size_t i;
273
+
274
+ for (i = 0; i < region.n; i++) {
275
+ struct tcg_region_tree *rt = region_trees + i * tree_size;
276
+
277
+ qemu_mutex_unlock(&rt->lock);
278
+ }
279
+}
280
+
281
+void tcg_tb_foreach(GTraverseFunc func, gpointer user_data)
282
+{
283
+ size_t i;
284
+
285
+ tcg_region_tree_lock_all();
286
+ for (i = 0; i < region.n; i++) {
287
+ struct tcg_region_tree *rt = region_trees + i * tree_size;
288
+
289
+ g_tree_foreach(rt->tree, func, user_data);
290
+ }
291
+ tcg_region_tree_unlock_all();
292
+}
293
+
294
+size_t tcg_nb_tbs(void)
295
+{
296
+ size_t nb_tbs = 0;
297
+ size_t i;
298
+
299
+ tcg_region_tree_lock_all();
300
+ for (i = 0; i < region.n; i++) {
301
+ struct tcg_region_tree *rt = region_trees + i * tree_size;
302
+
303
+ nb_tbs += g_tree_nnodes(rt->tree);
304
+ }
305
+ tcg_region_tree_unlock_all();
306
+ return nb_tbs;
307
+}
308
+
309
+static gboolean tcg_region_tree_traverse(gpointer k, gpointer v, gpointer data)
310
+{
311
+ TranslationBlock *tb = v;
312
+
313
+ tb_destroy(tb);
314
+ return FALSE;
315
+}
316
+
317
+static void tcg_region_tree_reset_all(void)
318
+{
319
+ size_t i;
320
+
321
+ tcg_region_tree_lock_all();
322
+ for (i = 0; i < region.n; i++) {
323
+ struct tcg_region_tree *rt = region_trees + i * tree_size;
324
+
325
+ g_tree_foreach(rt->tree, tcg_region_tree_traverse, NULL);
326
+ /* Increment the refcount first so that destroy acts as a reset */
327
+ g_tree_ref(rt->tree);
328
+ g_tree_destroy(rt->tree);
329
+ }
330
+ tcg_region_tree_unlock_all();
331
+}
332
+
333
+static void tcg_region_bounds(size_t curr_region, void **pstart, void **pend)
334
+{
335
+ void *start, *end;
336
+
337
+ start = region.start_aligned + curr_region * region.stride;
338
+ end = start + region.size;
339
+
340
+ if (curr_region == 0) {
341
+ start = region.start;
342
+ }
343
+ if (curr_region == region.n - 1) {
344
+ end = region.end;
345
+ }
346
+
347
+ *pstart = start;
348
+ *pend = end;
349
+}
350
+
351
+static void tcg_region_assign(TCGContext *s, size_t curr_region)
352
+{
353
+ void *start, *end;
354
+
355
+ tcg_region_bounds(curr_region, &start, &end);
356
+
357
+ s->code_gen_buffer = start;
358
+ s->code_gen_ptr = start;
359
+ s->code_gen_buffer_size = end - start;
360
+ s->code_gen_highwater = end - TCG_HIGHWATER;
361
+}
362
+
363
+static bool tcg_region_alloc__locked(TCGContext *s)
364
+{
365
+ if (region.current == region.n) {
366
+ return true;
367
+ }
368
+ tcg_region_assign(s, region.current);
369
+ region.current++;
370
+ return false;
371
+}
372
+
373
+/*
374
+ * Request a new region once the one in use has filled up.
375
+ * Returns true on error.
376
+ */
377
+bool tcg_region_alloc(TCGContext *s)
378
+{
379
+ bool err;
380
+ /* read the region size now; alloc__locked will overwrite it on success */
381
+ size_t size_full = s->code_gen_buffer_size;
382
+
383
+ qemu_mutex_lock(&region.lock);
384
+ err = tcg_region_alloc__locked(s);
385
+ if (!err) {
386
+ region.agg_size_full += size_full - TCG_HIGHWATER;
387
+ }
388
+ qemu_mutex_unlock(&region.lock);
389
+ return err;
390
+}
391
+
392
+/*
393
+ * Perform a context's first region allocation.
394
+ * This function does _not_ increment region.agg_size_full.
395
+ */
396
+static void tcg_region_initial_alloc__locked(TCGContext *s)
397
+{
398
+ bool err = tcg_region_alloc__locked(s);
399
+ g_assert(!err);
400
+}
401
+
402
+void tcg_region_initial_alloc(TCGContext *s)
403
+{
404
+ qemu_mutex_lock(&region.lock);
405
+ tcg_region_initial_alloc__locked(s);
406
+ qemu_mutex_unlock(&region.lock);
407
+}
408
+
409
+/* Call from a safe-work context */
410
+void tcg_region_reset_all(void)
411
+{
412
+ unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
413
+ unsigned int i;
414
+
415
+ qemu_mutex_lock(&region.lock);
416
+ region.current = 0;
417
+ region.agg_size_full = 0;
418
+
419
+ for (i = 0; i < n_ctxs; i++) {
420
+ TCGContext *s = qatomic_read(&tcg_ctxs[i]);
421
+ tcg_region_initial_alloc__locked(s);
422
+ }
423
+ qemu_mutex_unlock(&region.lock);
424
+
425
+ tcg_region_tree_reset_all();
426
+}
427
+
428
+#ifdef CONFIG_USER_ONLY
429
+static size_t tcg_n_regions(void)
430
+{
431
+ return 1;
432
+}
433
+#else
434
+/*
435
+ * It is likely that some vCPUs will translate more code than others, so we
436
+ * first try to set more regions than max_cpus, with those regions being of
437
+ * reasonable size. If that's not possible we make do by evenly dividing
438
+ * the code_gen_buffer among the vCPUs.
439
+ */
440
+static size_t tcg_n_regions(void)
441
+{
442
+ size_t i;
443
+
444
+ /* Use a single region if all we have is one vCPU thread */
445
+#if !defined(CONFIG_USER_ONLY)
446
+ MachineState *ms = MACHINE(qdev_get_machine());
447
+ unsigned int max_cpus = ms->smp.max_cpus;
448
+#endif
449
+ if (max_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
450
+ return 1;
451
+ }
452
+
453
+ /* Try to have more regions than max_cpus, with each region being >= 2 MB */
454
+ for (i = 8; i > 0; i--) {
455
+ size_t regions_per_thread = i;
456
+ size_t region_size;
457
+
458
+ region_size = tcg_init_ctx.code_gen_buffer_size;
459
+ region_size /= max_cpus * regions_per_thread;
460
+
461
+ if (region_size >= 2 * 1024u * 1024) {
462
+ return max_cpus * regions_per_thread;
463
+ }
464
+ }
465
+ /* If we can't, then just allocate one region per vCPU thread */
466
+ return max_cpus;
467
+}
468
+#endif
469
+
470
+/*
471
+ * Initializes region partitioning.
472
+ *
473
+ * Called at init time from the parent thread (i.e. the one calling
474
+ * tcg_context_init), after the target's TCG globals have been set.
475
+ *
476
+ * Region partitioning works by splitting code_gen_buffer into separate regions,
477
+ * and then assigning regions to TCG threads so that the threads can translate
478
+ * code in parallel without synchronization.
479
+ *
480
+ * In softmmu the number of TCG threads is bounded by max_cpus, so we use at
481
+ * least max_cpus regions in MTTCG. In !MTTCG we use a single region.
482
+ * Note that the TCG options from the command-line (i.e. -accel accel=tcg,[...])
483
+ * must have been parsed before calling this function, since it calls
484
+ * qemu_tcg_mttcg_enabled().
485
+ *
486
+ * In user-mode we use a single region. Having multiple regions in user-mode
487
+ * is not supported, because the number of vCPU threads (recall that each thread
488
+ * spawned by the guest corresponds to a vCPU thread) is only bounded by the
489
+ * OS, and usually this number is huge (tens of thousands is not uncommon).
490
+ * Thus, given this large bound on the number of vCPU threads and the fact
491
+ * that code_gen_buffer is allocated at compile-time, we cannot guarantee
492
+ * that the availability of at least one region per vCPU thread.
493
+ *
494
+ * However, this user-mode limitation is unlikely to be a significant problem
495
+ * in practice. Multi-threaded guests share most if not all of their translated
496
+ * code, which makes parallel code generation less appealing than in softmmu.
497
+ */
498
+void tcg_region_init(void)
499
+{
500
+ void *buf = tcg_init_ctx.code_gen_buffer;
501
+ void *aligned;
502
+ size_t size = tcg_init_ctx.code_gen_buffer_size;
503
+ size_t page_size = qemu_real_host_page_size;
504
+ size_t region_size;
505
+ size_t n_regions;
506
+ size_t i;
507
+
508
+ n_regions = tcg_n_regions();
509
+
510
+ /* The first region will be 'aligned - buf' bytes larger than the others */
511
+ aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
512
+ g_assert(aligned < tcg_init_ctx.code_gen_buffer + size);
513
+ /*
514
+ * Make region_size a multiple of page_size, using aligned as the start.
515
+ * As a result of this we might end up with a few extra pages at the end of
516
+ * the buffer; we will assign those to the last region.
517
+ */
518
+ region_size = (size - (aligned - buf)) / n_regions;
519
+ region_size = QEMU_ALIGN_DOWN(region_size, page_size);
520
+
521
+ /* A region must have at least 2 pages; one code, one guard */
522
+ g_assert(region_size >= 2 * page_size);
523
+
524
+ /* init the region struct */
525
+ qemu_mutex_init(&region.lock);
526
+ region.n = n_regions;
527
+ region.size = region_size - page_size;
528
+ region.stride = region_size;
529
+ region.start = buf;
530
+ region.start_aligned = aligned;
531
+ /* page-align the end, since its last page will be a guard page */
532
+ region.end = QEMU_ALIGN_PTR_DOWN(buf + size, page_size);
533
+ /* account for that last guard page */
534
+ region.end -= page_size;
535
+
536
+ /*
537
+ * Set guard pages in the rw buffer, as that's the one into which
538
+ * buffer overruns could occur. Do not set guard pages in the rx
539
+ * buffer -- let that one use hugepages throughout.
540
+ */
541
+ for (i = 0; i < region.n; i++) {
542
+ void *start, *end;
543
+
544
+ tcg_region_bounds(i, &start, &end);
545
+
546
+ /*
547
+ * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
548
+ * rejects a permission change from RWX -> NONE. Guard pages are
549
+ * nice for bug detection but are not essential; ignore any failure.
550
+ */
551
+ (void)qemu_mprotect_none(end, page_size);
552
+ }
553
+
554
+ tcg_region_trees_init();
555
+
556
+ /*
557
+ * Leave the initial context initialized to the first region.
558
+ * This will be the context into which we generate the prologue.
559
+ * It is also the only context for CONFIG_USER_ONLY.
560
+ */
561
+ tcg_region_initial_alloc__locked(&tcg_init_ctx);
562
+}
563
+
564
+void tcg_region_prologue_set(TCGContext *s)
565
+{
566
+ /* Deduct the prologue from the first region. */
567
+ g_assert(region.start == s->code_gen_buffer);
568
+ region.start = s->code_ptr;
569
+
570
+ /* Recompute boundaries of the first region. */
571
+ tcg_region_assign(s, 0);
572
+
573
+ /* Register the balance of the buffer with gdb. */
574
+ tcg_register_jit(tcg_splitwx_to_rx(region.start),
575
+ region.end - region.start);
576
+}
577
+
578
+/*
579
+ * Returns the size (in bytes) of all translated code (i.e. from all regions)
580
+ * currently in the cache.
581
+ * See also: tcg_code_capacity()
582
+ * Do not confuse with tcg_current_code_size(); that one applies to a single
583
+ * TCG context.
584
+ */
585
+size_t tcg_code_size(void)
586
+{
587
+ unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
588
+ unsigned int i;
589
+ size_t total;
590
+
591
+ qemu_mutex_lock(&region.lock);
592
+ total = region.agg_size_full;
593
+ for (i = 0; i < n_ctxs; i++) {
594
+ const TCGContext *s = qatomic_read(&tcg_ctxs[i]);
595
+ size_t size;
596
+
597
+ size = qatomic_read(&s->code_gen_ptr) - s->code_gen_buffer;
598
+ g_assert(size <= s->code_gen_buffer_size);
599
+ total += size;
600
+ }
601
+ qemu_mutex_unlock(&region.lock);
602
+ return total;
603
+}
604
+
605
+/*
606
+ * Returns the code capacity (in bytes) of the entire cache, i.e. including all
607
+ * regions.
608
+ * See also: tcg_code_size()
609
+ */
610
+size_t tcg_code_capacity(void)
611
+{
612
+ size_t guard_size, capacity;
613
+
614
+ /* no need for synchronization; these variables are set at init time */
615
+ guard_size = region.stride - region.size;
616
+ capacity = region.end + guard_size - region.start;
617
+ capacity -= region.n * (guard_size + TCG_HIGHWATER);
618
+ return capacity;
619
+}
620
+
621
+size_t tcg_tb_phys_invalidate_count(void)
622
+{
623
+ unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
624
+ unsigned int i;
625
+ size_t total = 0;
626
+
627
+ for (i = 0; i < n_ctxs; i++) {
628
+ const TCGContext *s = qatomic_read(&tcg_ctxs[i]);
629
+
630
+ total += qatomic_read(&s->tb_phys_invalidate_count);
631
+ }
632
+ return total;
633
+}
634
diff --git a/tcg/tcg.c b/tcg/tcg.c
635
index XXXXXXX..XXXXXXX 100644
636
--- a/tcg/tcg.c
637
+++ b/tcg/tcg.c
638
@@ -XXX,XX +XXX,XX @@
639
640
#include "elf.h"
641
#include "exec/log.h"
642
+#include "tcg-internal.h"
643
644
/* Forward declarations for functions declared in tcg-target.c.inc and
645
used here. */
646
@@ -XXX,XX +XXX,XX @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct);
647
static int tcg_out_ldst_finalize(TCGContext *s);
648
#endif
649
650
-#define TCG_HIGHWATER 1024
651
-
652
-static TCGContext **tcg_ctxs;
653
-static unsigned int n_tcg_ctxs;
654
+TCGContext **tcg_ctxs;
655
+unsigned int n_tcg_ctxs;
656
TCGv_env cpu_env = 0;
657
const void *tcg_code_gen_epilogue;
658
uintptr_t tcg_splitwx_diff;
659
@@ -XXX,XX +XXX,XX @@ uintptr_t tcg_splitwx_diff;
660
tcg_prologue_fn *tcg_qemu_tb_exec;
661
#endif
662
663
-struct tcg_region_tree {
664
- QemuMutex lock;
665
- GTree *tree;
666
- /* padding to avoid false sharing is computed at run-time */
667
-};
668
-
669
-/*
670
- * We divide code_gen_buffer into equally-sized "regions" that TCG threads
671
- * dynamically allocate from as demand dictates. Given appropriate region
672
- * sizing, this minimizes flushes even when some TCG threads generate a lot
673
- * more code than others.
674
- */
675
-struct tcg_region_state {
676
- QemuMutex lock;
677
-
678
- /* fields set at init time */
679
- void *start;
680
- void *start_aligned;
681
- void *end;
682
- size_t n;
683
- size_t size; /* size of one region */
684
- size_t stride; /* .size + guard size */
685
-
686
- /* fields protected by the lock */
687
- size_t current; /* current region index */
688
- size_t agg_size_full; /* aggregate size of full regions */
689
-};
690
-
691
-static struct tcg_region_state region;
692
-/*
693
- * This is an array of struct tcg_region_tree's, with padding.
694
- * We use void * to simplify the computation of region_trees[i]; each
695
- * struct is found every tree_size bytes.
696
- */
697
-static void *region_trees;
698
-static size_t tree_size;
699
static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT];
700
static TCGRegSet tcg_target_call_clobber_regs;
701
702
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef constraint_sets[] = {
703
704
#include "tcg-target.c.inc"
705
706
-/* compare a pointer @ptr and a tb_tc @s */
707
-static int ptr_cmp_tb_tc(const void *ptr, const struct tb_tc *s)
708
-{
709
- if (ptr >= s->ptr + s->size) {
710
- return 1;
711
- } else if (ptr < s->ptr) {
712
- return -1;
713
- }
714
- return 0;
715
-}
716
-
717
-static gint tb_tc_cmp(gconstpointer ap, gconstpointer bp)
718
-{
719
- const struct tb_tc *a = ap;
720
- const struct tb_tc *b = bp;
721
-
722
- /*
723
- * When both sizes are set, we know this isn't a lookup.
724
- * This is the most likely case: every TB must be inserted; lookups
725
- * are a lot less frequent.
726
- */
727
- if (likely(a->size && b->size)) {
728
- if (a->ptr > b->ptr) {
729
- return 1;
730
- } else if (a->ptr < b->ptr) {
731
- return -1;
732
- }
733
- /* a->ptr == b->ptr should happen only on deletions */
734
- g_assert(a->size == b->size);
735
- return 0;
736
- }
737
- /*
738
- * All lookups have either .size field set to 0.
739
- * From the glib sources we see that @ap is always the lookup key. However
740
- * the docs provide no guarantee, so we just mark this case as likely.
741
- */
742
- if (likely(a->size == 0)) {
743
- return ptr_cmp_tb_tc(a->ptr, b);
744
- }
745
- return ptr_cmp_tb_tc(b->ptr, a);
746
-}
747
-
748
-static void tcg_region_trees_init(void)
749
-{
750
- size_t i;
751
-
752
- tree_size = ROUND_UP(sizeof(struct tcg_region_tree), qemu_dcache_linesize);
753
- region_trees = qemu_memalign(qemu_dcache_linesize, region.n * tree_size);
754
- for (i = 0; i < region.n; i++) {
755
- struct tcg_region_tree *rt = region_trees + i * tree_size;
756
-
757
- qemu_mutex_init(&rt->lock);
758
- rt->tree = g_tree_new(tb_tc_cmp);
759
- }
760
-}
761
-
762
-static struct tcg_region_tree *tc_ptr_to_region_tree(const void *p)
763
-{
764
- size_t region_idx;
765
-
766
- /*
767
- * Like tcg_splitwx_to_rw, with no assert. The pc may come from
768
- * a signal handler over which the caller has no control.
769
- */
770
- if (!in_code_gen_buffer(p)) {
771
- p -= tcg_splitwx_diff;
772
- if (!in_code_gen_buffer(p)) {
773
- return NULL;
774
- }
775
- }
776
-
777
- if (p < region.start_aligned) {
778
- region_idx = 0;
779
- } else {
780
- ptrdiff_t offset = p - region.start_aligned;
781
-
782
- if (offset > region.stride * (region.n - 1)) {
783
- region_idx = region.n - 1;
784
- } else {
785
- region_idx = offset / region.stride;
786
- }
787
- }
788
- return region_trees + region_idx * tree_size;
789
-}
790
-
791
-void tcg_tb_insert(TranslationBlock *tb)
792
-{
793
- struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
794
-
795
- g_assert(rt != NULL);
796
- qemu_mutex_lock(&rt->lock);
797
- g_tree_insert(rt->tree, &tb->tc, tb);
798
- qemu_mutex_unlock(&rt->lock);
799
-}
800
-
801
-void tcg_tb_remove(TranslationBlock *tb)
802
-{
803
- struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
804
-
805
- g_assert(rt != NULL);
806
- qemu_mutex_lock(&rt->lock);
807
- g_tree_remove(rt->tree, &tb->tc);
808
- qemu_mutex_unlock(&rt->lock);
809
-}
810
-
811
-/*
812
- * Find the TB 'tb' such that
813
- * tb->tc.ptr <= tc_ptr < tb->tc.ptr + tb->tc.size
814
- * Return NULL if not found.
815
- */
816
-TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr)
817
-{
818
- struct tcg_region_tree *rt = tc_ptr_to_region_tree((void *)tc_ptr);
819
- TranslationBlock *tb;
820
- struct tb_tc s = { .ptr = (void *)tc_ptr };
821
-
822
- if (rt == NULL) {
823
- return NULL;
824
- }
825
-
826
- qemu_mutex_lock(&rt->lock);
827
- tb = g_tree_lookup(rt->tree, &s);
828
- qemu_mutex_unlock(&rt->lock);
829
- return tb;
830
-}
831
-
832
-static void tcg_region_tree_lock_all(void)
833
-{
834
- size_t i;
835
-
836
- for (i = 0; i < region.n; i++) {
837
- struct tcg_region_tree *rt = region_trees + i * tree_size;
838
-
839
- qemu_mutex_lock(&rt->lock);
840
- }
841
-}
842
-
843
-static void tcg_region_tree_unlock_all(void)
844
-{
845
- size_t i;
846
-
847
- for (i = 0; i < region.n; i++) {
848
- struct tcg_region_tree *rt = region_trees + i * tree_size;
849
-
850
- qemu_mutex_unlock(&rt->lock);
851
- }
852
-}
853
-
854
-void tcg_tb_foreach(GTraverseFunc func, gpointer user_data)
855
-{
856
- size_t i;
857
-
858
- tcg_region_tree_lock_all();
859
- for (i = 0; i < region.n; i++) {
860
- struct tcg_region_tree *rt = region_trees + i * tree_size;
861
-
862
- g_tree_foreach(rt->tree, func, user_data);
863
- }
864
- tcg_region_tree_unlock_all();
865
-}
866
-
867
-size_t tcg_nb_tbs(void)
868
-{
869
- size_t nb_tbs = 0;
870
- size_t i;
871
-
872
- tcg_region_tree_lock_all();
873
- for (i = 0; i < region.n; i++) {
874
- struct tcg_region_tree *rt = region_trees + i * tree_size;
875
-
876
- nb_tbs += g_tree_nnodes(rt->tree);
877
- }
878
- tcg_region_tree_unlock_all();
879
- return nb_tbs;
880
-}
881
-
882
-static gboolean tcg_region_tree_traverse(gpointer k, gpointer v, gpointer data)
883
-{
884
- TranslationBlock *tb = v;
885
-
886
- tb_destroy(tb);
887
- return FALSE;
888
-}
889
-
890
-static void tcg_region_tree_reset_all(void)
891
-{
892
- size_t i;
893
-
894
- tcg_region_tree_lock_all();
895
- for (i = 0; i < region.n; i++) {
896
- struct tcg_region_tree *rt = region_trees + i * tree_size;
897
-
898
- g_tree_foreach(rt->tree, tcg_region_tree_traverse, NULL);
899
- /* Increment the refcount first so that destroy acts as a reset */
900
- g_tree_ref(rt->tree);
901
- g_tree_destroy(rt->tree);
902
- }
903
- tcg_region_tree_unlock_all();
904
-}
905
-
906
-static void tcg_region_bounds(size_t curr_region, void **pstart, void **pend)
907
-{
908
- void *start, *end;
909
-
910
- start = region.start_aligned + curr_region * region.stride;
911
- end = start + region.size;
912
-
913
- if (curr_region == 0) {
914
- start = region.start;
915
- }
916
- if (curr_region == region.n - 1) {
917
- end = region.end;
918
- }
919
-
920
- *pstart = start;
921
- *pend = end;
922
-}
923
-
924
-static void tcg_region_assign(TCGContext *s, size_t curr_region)
925
-{
926
- void *start, *end;
927
-
928
- tcg_region_bounds(curr_region, &start, &end);
929
-
930
- s->code_gen_buffer = start;
931
- s->code_gen_ptr = start;
932
- s->code_gen_buffer_size = end - start;
933
- s->code_gen_highwater = end - TCG_HIGHWATER;
934
-}
935
-
936
-static bool tcg_region_alloc__locked(TCGContext *s)
937
-{
938
- if (region.current == region.n) {
939
- return true;
940
- }
941
- tcg_region_assign(s, region.current);
942
- region.current++;
943
- return false;
944
-}
945
-
946
-/*
947
- * Request a new region once the one in use has filled up.
948
- * Returns true on error.
949
- */
950
-static bool tcg_region_alloc(TCGContext *s)
951
-{
952
- bool err;
953
- /* read the region size now; alloc__locked will overwrite it on success */
954
- size_t size_full = s->code_gen_buffer_size;
955
-
956
- qemu_mutex_lock(&region.lock);
957
- err = tcg_region_alloc__locked(s);
958
- if (!err) {
959
- region.agg_size_full += size_full - TCG_HIGHWATER;
960
- }
961
- qemu_mutex_unlock(&region.lock);
962
- return err;
963
-}
964
-
965
-/*
966
- * Perform a context's first region allocation.
967
- * This function does _not_ increment region.agg_size_full.
968
- */
969
-static void tcg_region_initial_alloc__locked(TCGContext *s)
970
-{
971
- bool err = tcg_region_alloc__locked(s);
972
- g_assert(!err);
973
-}
974
-
975
-#ifndef CONFIG_USER_ONLY
976
-static void tcg_region_initial_alloc(TCGContext *s)
977
-{
978
- qemu_mutex_lock(&region.lock);
979
- tcg_region_initial_alloc__locked(s);
980
- qemu_mutex_unlock(&region.lock);
981
-}
982
-#endif
983
-
984
-/* Call from a safe-work context */
985
-void tcg_region_reset_all(void)
986
-{
987
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
988
- unsigned int i;
989
-
990
- qemu_mutex_lock(&region.lock);
991
- region.current = 0;
992
- region.agg_size_full = 0;
993
-
994
- for (i = 0; i < n_ctxs; i++) {
995
- TCGContext *s = qatomic_read(&tcg_ctxs[i]);
996
- tcg_region_initial_alloc__locked(s);
997
- }
998
- qemu_mutex_unlock(&region.lock);
999
-
1000
- tcg_region_tree_reset_all();
1001
-}
1002
-
1003
-#ifdef CONFIG_USER_ONLY
1004
-static size_t tcg_n_regions(void)
1005
-{
1006
- return 1;
1007
-}
1008
-#else
1009
-/*
1010
- * It is likely that some vCPUs will translate more code than others, so we
1011
- * first try to set more regions than max_cpus, with those regions being of
1012
- * reasonable size. If that's not possible we make do by evenly dividing
1013
- * the code_gen_buffer among the vCPUs.
1014
- */
1015
-static size_t tcg_n_regions(void)
1016
-{
1017
- size_t i;
1018
-
1019
- /* Use a single region if all we have is one vCPU thread */
1020
-#if !defined(CONFIG_USER_ONLY)
1021
- MachineState *ms = MACHINE(qdev_get_machine());
1022
- unsigned int max_cpus = ms->smp.max_cpus;
1023
-#endif
1024
- if (max_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
1025
- return 1;
1026
- }
1027
-
1028
- /* Try to have more regions than max_cpus, with each region being >= 2 MB */
1029
- for (i = 8; i > 0; i--) {
1030
- size_t regions_per_thread = i;
1031
- size_t region_size;
1032
-
1033
- region_size = tcg_init_ctx.code_gen_buffer_size;
1034
- region_size /= max_cpus * regions_per_thread;
1035
-
1036
- if (region_size >= 2 * 1024u * 1024) {
1037
- return max_cpus * regions_per_thread;
1038
- }
1039
- }
1040
- /* If we can't, then just allocate one region per vCPU thread */
1041
- return max_cpus;
1042
-}
1043
-#endif
1044
-
1045
-/*
1046
- * Initializes region partitioning.
1047
- *
1048
- * Called at init time from the parent thread (i.e. the one calling
1049
- * tcg_context_init), after the target's TCG globals have been set.
1050
- *
1051
- * Region partitioning works by splitting code_gen_buffer into separate regions,
1052
- * and then assigning regions to TCG threads so that the threads can translate
1053
- * code in parallel without synchronization.
1054
- *
1055
- * In softmmu the number of TCG threads is bounded by max_cpus, so we use at
1056
- * least max_cpus regions in MTTCG. In !MTTCG we use a single region.
1057
- * Note that the TCG options from the command-line (i.e. -accel accel=tcg,[...])
1058
- * must have been parsed before calling this function, since it calls
1059
- * qemu_tcg_mttcg_enabled().
1060
- *
1061
- * In user-mode we use a single region. Having multiple regions in user-mode
1062
- * is not supported, because the number of vCPU threads (recall that each thread
1063
- * spawned by the guest corresponds to a vCPU thread) is only bounded by the
1064
- * OS, and usually this number is huge (tens of thousands is not uncommon).
1065
- * Thus, given this large bound on the number of vCPU threads and the fact
1066
- * that code_gen_buffer is allocated at compile-time, we cannot guarantee
1067
- * that the availability of at least one region per vCPU thread.
1068
- *
1069
- * However, this user-mode limitation is unlikely to be a significant problem
1070
- * in practice. Multi-threaded guests share most if not all of their translated
1071
- * code, which makes parallel code generation less appealing than in softmmu.
1072
- */
1073
-void tcg_region_init(void)
1074
-{
1075
- void *buf = tcg_init_ctx.code_gen_buffer;
1076
- void *aligned;
1077
- size_t size = tcg_init_ctx.code_gen_buffer_size;
1078
- size_t page_size = qemu_real_host_page_size;
1079
- size_t region_size;
1080
- size_t n_regions;
1081
- size_t i;
1082
-
1083
- n_regions = tcg_n_regions();
1084
-
1085
- /* The first region will be 'aligned - buf' bytes larger than the others */
1086
- aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
1087
- g_assert(aligned < tcg_init_ctx.code_gen_buffer + size);
1088
- /*
1089
- * Make region_size a multiple of page_size, using aligned as the start.
1090
- * As a result of this we might end up with a few extra pages at the end of
1091
- * the buffer; we will assign those to the last region.
1092
- */
1093
- region_size = (size - (aligned - buf)) / n_regions;
1094
- region_size = QEMU_ALIGN_DOWN(region_size, page_size);
1095
-
1096
- /* A region must have at least 2 pages; one code, one guard */
1097
- g_assert(region_size >= 2 * page_size);
1098
-
1099
- /* init the region struct */
1100
- qemu_mutex_init(&region.lock);
1101
- region.n = n_regions;
1102
- region.size = region_size - page_size;
1103
- region.stride = region_size;
1104
- region.start = buf;
1105
- region.start_aligned = aligned;
1106
- /* page-align the end, since its last page will be a guard page */
1107
- region.end = QEMU_ALIGN_PTR_DOWN(buf + size, page_size);
1108
- /* account for that last guard page */
1109
- region.end -= page_size;
1110
-
1111
- /*
1112
- * Set guard pages in the rw buffer, as that's the one into which
1113
- * buffer overruns could occur. Do not set guard pages in the rx
1114
- * buffer -- let that one use hugepages throughout.
1115
- */
1116
- for (i = 0; i < region.n; i++) {
1117
- void *start, *end;
1118
-
1119
- tcg_region_bounds(i, &start, &end);
1120
-
1121
- /*
1122
- * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
1123
- * rejects a permission change from RWX -> NONE. Guard pages are
1124
- * nice for bug detection but are not essential; ignore any failure.
1125
- */
1126
- (void)qemu_mprotect_none(end, page_size);
1127
- }
1128
-
1129
- tcg_region_trees_init();
1130
-
1131
- /*
1132
- * Leave the initial context initialized to the first region.
1133
- * This will be the context into which we generate the prologue.
1134
- * It is also the only context for CONFIG_USER_ONLY.
1135
- */
1136
- tcg_region_initial_alloc__locked(&tcg_init_ctx);
1137
-}
1138
-
1139
-static void tcg_region_prologue_set(TCGContext *s)
1140
-{
1141
- /* Deduct the prologue from the first region. */
1142
- g_assert(region.start == s->code_gen_buffer);
1143
- region.start = s->code_ptr;
1144
-
1145
- /* Recompute boundaries of the first region. */
1146
- tcg_region_assign(s, 0);
1147
-
1148
- /* Register the balance of the buffer with gdb. */
1149
- tcg_register_jit(tcg_splitwx_to_rx(region.start),
1150
- region.end - region.start);
1151
-}
1152
-
1153
#ifdef CONFIG_DEBUG_TCG
1154
const void *tcg_splitwx_to_rx(void *rw)
1155
{
1156
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
1157
}
1158
#endif /* !CONFIG_USER_ONLY */
1159
1160
-/*
1161
- * Returns the size (in bytes) of all translated code (i.e. from all regions)
1162
- * currently in the cache.
1163
- * See also: tcg_code_capacity()
1164
- * Do not confuse with tcg_current_code_size(); that one applies to a single
1165
- * TCG context.
1166
- */
1167
-size_t tcg_code_size(void)
1168
-{
1169
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
1170
- unsigned int i;
1171
- size_t total;
1172
-
1173
- qemu_mutex_lock(&region.lock);
1174
- total = region.agg_size_full;
1175
- for (i = 0; i < n_ctxs; i++) {
1176
- const TCGContext *s = qatomic_read(&tcg_ctxs[i]);
1177
- size_t size;
1178
-
1179
- size = qatomic_read(&s->code_gen_ptr) - s->code_gen_buffer;
1180
- g_assert(size <= s->code_gen_buffer_size);
1181
- total += size;
1182
- }
1183
- qemu_mutex_unlock(&region.lock);
1184
- return total;
1185
-}
1186
-
1187
-/*
1188
- * Returns the code capacity (in bytes) of the entire cache, i.e. including all
1189
- * regions.
1190
- * See also: tcg_code_size()
1191
- */
1192
-size_t tcg_code_capacity(void)
1193
-{
1194
- size_t guard_size, capacity;
1195
-
1196
- /* no need for synchronization; these variables are set at init time */
1197
- guard_size = region.stride - region.size;
1198
- capacity = region.end + guard_size - region.start;
1199
- capacity -= region.n * (guard_size + TCG_HIGHWATER);
1200
- return capacity;
1201
-}
1202
-
1203
-size_t tcg_tb_phys_invalidate_count(void)
1204
-{
1205
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
1206
- unsigned int i;
1207
- size_t total = 0;
1208
-
1209
- for (i = 0; i < n_ctxs; i++) {
1210
- const TCGContext *s = qatomic_read(&tcg_ctxs[i]);
1211
-
1212
- total += qatomic_read(&s->tb_phys_invalidate_count);
1213
- }
1214
- return total;
1215
-}
1216
-
1217
/* pool based memory allocation */
1218
void *tcg_malloc_internal(TCGContext *s, int size)
1219
{
1220
diff --git a/tcg/meson.build b/tcg/meson.build
1221
index XXXXXXX..XXXXXXX 100644
1222
--- a/tcg/meson.build
1223
+++ b/tcg/meson.build
1224
@@ -XXX,XX +XXX,XX @@ tcg_ss = ss.source_set()
1225
1226
tcg_ss.add(files(
1227
'optimize.c',
1228
+ 'region.c',
1229
'tcg.c',
1230
'tcg-common.c',
1231
'tcg-op.c',
1232
--
1233
2.25.1
1234
1235
diff view generated by jsdifflib
New patch
1
It consists of one function call and has only one caller.
1
2
3
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
8
accel/tcg/translate-all.c | 7 +------
9
1 file changed, 1 insertion(+), 6 deletions(-)
10
11
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
12
index XXXXXXX..XXXXXXX 100644
13
--- a/accel/tcg/translate-all.c
14
+++ b/accel/tcg/translate-all.c
15
@@ -XXX,XX +XXX,XX @@ static void page_table_config_init(void)
16
assert(v_l2_levels >= 0);
17
}
18
19
-static void cpu_gen_init(void)
20
-{
21
- tcg_context_init(&tcg_init_ctx);
22
-}
23
-
24
/* Encode VAL as a signed leb128 sequence at P.
25
Return P incremented past the encoded value. */
26
static uint8_t *encode_sleb128(uint8_t *p, target_long val)
27
@@ -XXX,XX +XXX,XX @@ void tcg_exec_init(unsigned long tb_size, int splitwx)
28
bool ok;
29
30
tcg_allowed = true;
31
- cpu_gen_init();
32
+ tcg_context_init(&tcg_init_ctx);
33
page_init();
34
tb_htable_init();
35
36
--
37
2.25.1
38
39
diff view generated by jsdifflib
New patch
1
Buffer management is integral to tcg. Do not leave the allocation
2
to code outside of tcg/. This is code movement, with further
3
cleanups to follow.
1
4
5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
include/tcg/tcg.h | 2 +-
10
accel/tcg/translate-all.c | 414 +-----------------------------------
11
tcg/region.c | 431 +++++++++++++++++++++++++++++++++++++-
12
3 files changed, 428 insertions(+), 419 deletions(-)
13
14
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
15
index XXXXXXX..XXXXXXX 100644
16
--- a/include/tcg/tcg.h
17
+++ b/include/tcg/tcg.h
18
@@ -XXX,XX +XXX,XX @@ void *tcg_malloc_internal(TCGContext *s, int size);
19
void tcg_pool_reset(TCGContext *s);
20
TranslationBlock *tcg_tb_alloc(TCGContext *s);
21
22
-void tcg_region_init(void);
23
+void tcg_region_init(size_t tb_size, int splitwx);
24
void tb_destroy(TranslationBlock *tb);
25
void tcg_region_reset_all(void);
26
27
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
28
index XXXXXXX..XXXXXXX 100644
29
--- a/accel/tcg/translate-all.c
30
+++ b/accel/tcg/translate-all.c
31
@@ -XXX,XX +XXX,XX @@
32
*/
33
34
#include "qemu/osdep.h"
35
-#include "qemu/units.h"
36
#include "qemu-common.h"
37
38
#define NO_CPU_IO_DEFS
39
@@ -XXX,XX +XXX,XX @@
40
#include "exec/cputlb.h"
41
#include "exec/translate-all.h"
42
#include "qemu/bitmap.h"
43
-#include "qemu/error-report.h"
44
#include "qemu/qemu-print.h"
45
#include "qemu/timer.h"
46
#include "qemu/main-loop.h"
47
@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
48
}
49
}
50
51
-/* Minimum size of the code gen buffer. This number is randomly chosen,
52
- but not so small that we can't have a fair number of TB's live. */
53
-#define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB)
54
-
55
-/* Maximum size of the code gen buffer we'd like to use. Unless otherwise
56
- indicated, this is constrained by the range of direct branches on the
57
- host cpu, as used by the TCG implementation of goto_tb. */
58
-#if defined(__x86_64__)
59
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
60
-#elif defined(__sparc__)
61
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
62
-#elif defined(__powerpc64__)
63
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
64
-#elif defined(__powerpc__)
65
-# define MAX_CODE_GEN_BUFFER_SIZE (32 * MiB)
66
-#elif defined(__aarch64__)
67
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
68
-#elif defined(__s390x__)
69
- /* We have a +- 4GB range on the branches; leave some slop. */
70
-# define MAX_CODE_GEN_BUFFER_SIZE (3 * GiB)
71
-#elif defined(__mips__)
72
- /* We have a 256MB branch region, but leave room to make sure the
73
- main executable is also within that region. */
74
-# define MAX_CODE_GEN_BUFFER_SIZE (128 * MiB)
75
-#else
76
-# define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
77
-#endif
78
-
79
-#if TCG_TARGET_REG_BITS == 32
80
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
81
-#ifdef CONFIG_USER_ONLY
82
-/*
83
- * For user mode on smaller 32 bit systems we may run into trouble
84
- * allocating big chunks of data in the right place. On these systems
85
- * we utilise a static code generation buffer directly in the binary.
86
- */
87
-#define USE_STATIC_CODE_GEN_BUFFER
88
-#endif
89
-#else /* TCG_TARGET_REG_BITS == 64 */
90
-#ifdef CONFIG_USER_ONLY
91
-/*
92
- * As user-mode emulation typically means running multiple instances
93
- * of the translator don't go too nuts with our default code gen
94
- * buffer lest we make things too hard for the OS.
95
- */
96
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
97
-#else
98
-/*
99
- * We expect most system emulation to run one or two guests per host.
100
- * Users running large scale system emulation may want to tweak their
101
- * runtime setup via the tb-size control on the command line.
102
- */
103
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
104
-#endif
105
-#endif
106
-
107
-#define DEFAULT_CODE_GEN_BUFFER_SIZE \
108
- (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
109
- ? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE)
110
-
111
-static size_t size_code_gen_buffer(size_t tb_size)
112
-{
113
- /* Size the buffer. */
114
- if (tb_size == 0) {
115
- size_t phys_mem = qemu_get_host_physmem();
116
- if (phys_mem == 0) {
117
- tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
118
- } else {
119
- tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8);
120
- }
121
- }
122
- if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
123
- tb_size = MIN_CODE_GEN_BUFFER_SIZE;
124
- }
125
- if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
126
- tb_size = MAX_CODE_GEN_BUFFER_SIZE;
127
- }
128
- return tb_size;
129
-}
130
-
131
-#ifdef __mips__
132
-/* In order to use J and JAL within the code_gen_buffer, we require
133
- that the buffer not cross a 256MB boundary. */
134
-static inline bool cross_256mb(void *addr, size_t size)
135
-{
136
- return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & ~0x0ffffffful;
137
-}
138
-
139
-/* We weren't able to allocate a buffer without crossing that boundary,
140
- so make do with the larger portion of the buffer that doesn't cross.
141
- Returns the new base of the buffer, and adjusts code_gen_buffer_size. */
142
-static inline void *split_cross_256mb(void *buf1, size_t size1)
143
-{
144
- void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffffffful);
145
- size_t size2 = buf1 + size1 - buf2;
146
-
147
- size1 = buf2 - buf1;
148
- if (size1 < size2) {
149
- size1 = size2;
150
- buf1 = buf2;
151
- }
152
-
153
- tcg_ctx->code_gen_buffer_size = size1;
154
- return buf1;
155
-}
156
-#endif
157
-
158
-#ifdef USE_STATIC_CODE_GEN_BUFFER
159
-static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
160
- __attribute__((aligned(CODE_GEN_ALIGN)));
161
-
162
-static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
163
-{
164
- void *buf, *end;
165
- size_t size;
166
-
167
- if (splitwx > 0) {
168
- error_setg(errp, "jit split-wx not supported");
169
- return false;
170
- }
171
-
172
- /* page-align the beginning and end of the buffer */
173
- buf = static_code_gen_buffer;
174
- end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
175
- buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
176
- end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
177
-
178
- size = end - buf;
179
-
180
- /* Honor a command-line option limiting the size of the buffer. */
181
- if (size > tb_size) {
182
- size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size);
183
- }
184
- tcg_ctx->code_gen_buffer_size = size;
185
-
186
-#ifdef __mips__
187
- if (cross_256mb(buf, size)) {
188
- buf = split_cross_256mb(buf, size);
189
- size = tcg_ctx->code_gen_buffer_size;
190
- }
191
-#endif
192
-
193
- if (qemu_mprotect_rwx(buf, size)) {
194
- error_setg_errno(errp, errno, "mprotect of jit buffer");
195
- return false;
196
- }
197
- qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
198
-
199
- tcg_ctx->code_gen_buffer = buf;
200
- return true;
201
-}
202
-#elif defined(_WIN32)
203
-static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
204
-{
205
- void *buf;
206
-
207
- if (splitwx > 0) {
208
- error_setg(errp, "jit split-wx not supported");
209
- return false;
210
- }
211
-
212
- buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
213
- PAGE_EXECUTE_READWRITE);
214
- if (buf == NULL) {
215
- error_setg_win32(errp, GetLastError(),
216
- "allocate %zu bytes for jit buffer", size);
217
- return false;
218
- }
219
-
220
- tcg_ctx->code_gen_buffer = buf;
221
- tcg_ctx->code_gen_buffer_size = size;
222
- return true;
223
-}
224
-#else
225
-static bool alloc_code_gen_buffer_anon(size_t size, int prot,
226
- int flags, Error **errp)
227
-{
228
- void *buf;
229
-
230
- buf = mmap(NULL, size, prot, flags, -1, 0);
231
- if (buf == MAP_FAILED) {
232
- error_setg_errno(errp, errno,
233
- "allocate %zu bytes for jit buffer", size);
234
- return false;
235
- }
236
- tcg_ctx->code_gen_buffer_size = size;
237
-
238
-#ifdef __mips__
239
- if (cross_256mb(buf, size)) {
240
- /*
241
- * Try again, with the original still mapped, to avoid re-acquiring
242
- * the same 256mb crossing.
243
- */
244
- size_t size2;
245
- void *buf2 = mmap(NULL, size, prot, flags, -1, 0);
246
- switch ((int)(buf2 != MAP_FAILED)) {
247
- case 1:
248
- if (!cross_256mb(buf2, size)) {
249
- /* Success! Use the new buffer. */
250
- munmap(buf, size);
251
- break;
252
- }
253
- /* Failure. Work with what we had. */
254
- munmap(buf2, size);
255
- /* fallthru */
256
- default:
257
- /* Split the original buffer. Free the smaller half. */
258
- buf2 = split_cross_256mb(buf, size);
259
- size2 = tcg_ctx->code_gen_buffer_size;
260
- if (buf == buf2) {
261
- munmap(buf + size2, size - size2);
262
- } else {
263
- munmap(buf, size - size2);
264
- }
265
- size = size2;
266
- break;
267
- }
268
- buf = buf2;
269
- }
270
-#endif
271
-
272
- /* Request large pages for the buffer. */
273
- qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
274
-
275
- tcg_ctx->code_gen_buffer = buf;
276
- return true;
277
-}
278
-
279
-#ifndef CONFIG_TCG_INTERPRETER
280
-#ifdef CONFIG_POSIX
281
-#include "qemu/memfd.h"
282
-
283
-static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
284
-{
285
- void *buf_rw = NULL, *buf_rx = MAP_FAILED;
286
- int fd = -1;
287
-
288
-#ifdef __mips__
289
- /* Find space for the RX mapping, vs the 256MiB regions. */
290
- if (!alloc_code_gen_buffer_anon(size, PROT_NONE,
291
- MAP_PRIVATE | MAP_ANONYMOUS |
292
- MAP_NORESERVE, errp)) {
293
- return false;
294
- }
295
- /* The size of the mapping may have been adjusted. */
296
- size = tcg_ctx->code_gen_buffer_size;
297
- buf_rx = tcg_ctx->code_gen_buffer;
298
-#endif
299
-
300
- buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp);
301
- if (buf_rw == NULL) {
302
- goto fail;
303
- }
304
-
305
-#ifdef __mips__
306
- void *tmp = mmap(buf_rx, size, PROT_READ | PROT_EXEC,
307
- MAP_SHARED | MAP_FIXED, fd, 0);
308
- if (tmp != buf_rx) {
309
- goto fail_rx;
310
- }
311
-#else
312
- buf_rx = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
313
- if (buf_rx == MAP_FAILED) {
314
- goto fail_rx;
315
- }
316
-#endif
317
-
318
- close(fd);
319
- tcg_ctx->code_gen_buffer = buf_rw;
320
- tcg_ctx->code_gen_buffer_size = size;
321
- tcg_splitwx_diff = buf_rx - buf_rw;
322
-
323
- /* Request large pages for the buffer and the splitwx. */
324
- qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
325
- qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
326
- return true;
327
-
328
- fail_rx:
329
- error_setg_errno(errp, errno, "failed to map shared memory for execute");
330
- fail:
331
- if (buf_rx != MAP_FAILED) {
332
- munmap(buf_rx, size);
333
- }
334
- if (buf_rw) {
335
- munmap(buf_rw, size);
336
- }
337
- if (fd >= 0) {
338
- close(fd);
339
- }
340
- return false;
341
-}
342
-#endif /* CONFIG_POSIX */
343
-
344
-#ifdef CONFIG_DARWIN
345
-#include <mach/mach.h>
346
-
347
-extern kern_return_t mach_vm_remap(vm_map_t target_task,
348
- mach_vm_address_t *target_address,
349
- mach_vm_size_t size,
350
- mach_vm_offset_t mask,
351
- int flags,
352
- vm_map_t src_task,
353
- mach_vm_address_t src_address,
354
- boolean_t copy,
355
- vm_prot_t *cur_protection,
356
- vm_prot_t *max_protection,
357
- vm_inherit_t inheritance);
358
-
359
-static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
360
-{
361
- kern_return_t ret;
362
- mach_vm_address_t buf_rw, buf_rx;
363
- vm_prot_t cur_prot, max_prot;
364
-
365
- /* Map the read-write portion via normal anon memory. */
366
- if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE,
367
- MAP_PRIVATE | MAP_ANONYMOUS, errp)) {
368
- return false;
369
- }
370
-
371
- buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;
372
- buf_rx = 0;
373
- ret = mach_vm_remap(mach_task_self(),
374
- &buf_rx,
375
- size,
376
- 0,
377
- VM_FLAGS_ANYWHERE,
378
- mach_task_self(),
379
- buf_rw,
380
- false,
381
- &cur_prot,
382
- &max_prot,
383
- VM_INHERIT_NONE);
384
- if (ret != KERN_SUCCESS) {
385
- /* TODO: Convert "ret" to a human readable error message. */
386
- error_setg(errp, "vm_remap for jit splitwx failed");
387
- munmap((void *)buf_rw, size);
388
- return false;
389
- }
390
-
391
- if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {
392
- error_setg_errno(errp, errno, "mprotect for jit splitwx");
393
- munmap((void *)buf_rx, size);
394
- munmap((void *)buf_rw, size);
395
- return false;
396
- }
397
-
398
- tcg_splitwx_diff = buf_rx - buf_rw;
399
- return true;
400
-}
401
-#endif /* CONFIG_DARWIN */
402
-#endif /* CONFIG_TCG_INTERPRETER */
403
-
404
-static bool alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
405
-{
406
-#ifndef CONFIG_TCG_INTERPRETER
407
-# ifdef CONFIG_DARWIN
408
- return alloc_code_gen_buffer_splitwx_vmremap(size, errp);
409
-# endif
410
-# ifdef CONFIG_POSIX
411
- return alloc_code_gen_buffer_splitwx_memfd(size, errp);
412
-# endif
413
-#endif
414
- error_setg(errp, "jit split-wx not supported");
415
- return false;
416
-}
417
-
418
-static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
419
-{
420
- ERRP_GUARD();
421
- int prot, flags;
422
-
423
- if (splitwx) {
424
- if (alloc_code_gen_buffer_splitwx(size, errp)) {
425
- return true;
426
- }
427
- /*
428
- * If splitwx force-on (1), fail;
429
- * if splitwx default-on (-1), fall through to splitwx off.
430
- */
431
- if (splitwx > 0) {
432
- return false;
433
- }
434
- error_free_or_abort(errp);
435
- }
436
-
437
- prot = PROT_READ | PROT_WRITE | PROT_EXEC;
438
- flags = MAP_PRIVATE | MAP_ANONYMOUS;
439
-#ifdef CONFIG_TCG_INTERPRETER
440
- /* The tcg interpreter does not need execute permission. */
441
- prot = PROT_READ | PROT_WRITE;
442
-#elif defined(CONFIG_DARWIN)
443
- /* Applicable to both iOS and macOS (Apple Silicon). */
444
- if (!splitwx) {
445
- flags |= MAP_JIT;
446
- }
447
-#endif
448
-
449
- return alloc_code_gen_buffer_anon(size, prot, flags, errp);
450
-}
451
-#endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
452
-
453
static bool tb_cmp(const void *ap, const void *bp)
454
{
455
const TranslationBlock *a = ap;
456
@@ -XXX,XX +XXX,XX @@ static void tb_htable_init(void)
457
size. */
458
void tcg_exec_init(unsigned long tb_size, int splitwx)
459
{
460
- bool ok;
461
-
462
tcg_allowed = true;
463
tcg_context_init(&tcg_init_ctx);
464
page_init();
465
tb_htable_init();
466
-
467
- ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
468
- splitwx, &error_fatal);
469
- assert(ok);
470
-
471
- /* TODO: allocating regions is hand-in-glove with code_gen_buffer. */
472
- tcg_region_init();
473
+ tcg_region_init(tb_size, splitwx);
474
475
#if defined(CONFIG_SOFTMMU)
476
/* There's no guest base to take into account, so go ahead and
477
diff --git a/tcg/region.c b/tcg/region.c
478
index XXXXXXX..XXXXXXX 100644
479
--- a/tcg/region.c
480
+++ b/tcg/region.c
481
@@ -XXX,XX +XXX,XX @@
482
*/
483
484
#include "qemu/osdep.h"
485
+#include "qemu/units.h"
486
+#include "qapi/error.h"
487
#include "exec/exec-all.h"
488
#include "tcg/tcg.h"
489
#if !defined(CONFIG_USER_ONLY)
490
@@ -XXX,XX +XXX,XX @@ static size_t tcg_n_regions(void)
491
}
492
#endif
493
494
+/*
495
+ * Minimum size of the code gen buffer. This number is randomly chosen,
496
+ * but not so small that we can't have a fair number of TB's live.
497
+ */
498
+#define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB)
499
+
500
+/*
501
+ * Maximum size of the code gen buffer we'd like to use. Unless otherwise
502
+ * indicated, this is constrained by the range of direct branches on the
503
+ * host cpu, as used by the TCG implementation of goto_tb.
504
+ */
505
+#if defined(__x86_64__)
506
+# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
507
+#elif defined(__sparc__)
508
+# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
509
+#elif defined(__powerpc64__)
510
+# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
511
+#elif defined(__powerpc__)
512
+# define MAX_CODE_GEN_BUFFER_SIZE (32 * MiB)
513
+#elif defined(__aarch64__)
514
+# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
515
+#elif defined(__s390x__)
516
+ /* We have a +- 4GB range on the branches; leave some slop. */
517
+# define MAX_CODE_GEN_BUFFER_SIZE (3 * GiB)
518
+#elif defined(__mips__)
519
+ /*
520
+ * We have a 256MB branch region, but leave room to make sure the
521
+ * main executable is also within that region.
522
+ */
523
+# define MAX_CODE_GEN_BUFFER_SIZE (128 * MiB)
524
+#else
525
+# define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
526
+#endif
527
+
528
+#if TCG_TARGET_REG_BITS == 32
529
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
530
+#ifdef CONFIG_USER_ONLY
531
+/*
532
+ * For user mode on smaller 32 bit systems we may run into trouble
533
+ * allocating big chunks of data in the right place. On these systems
534
+ * we utilise a static code generation buffer directly in the binary.
535
+ */
536
+#define USE_STATIC_CODE_GEN_BUFFER
537
+#endif
538
+#else /* TCG_TARGET_REG_BITS == 64 */
539
+#ifdef CONFIG_USER_ONLY
540
+/*
541
+ * As user-mode emulation typically means running multiple instances
542
+ * of the translator don't go too nuts with our default code gen
543
+ * buffer lest we make things too hard for the OS.
544
+ */
545
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
546
+#else
547
+/*
548
+ * We expect most system emulation to run one or two guests per host.
549
+ * Users running large scale system emulation may want to tweak their
550
+ * runtime setup via the tb-size control on the command line.
551
+ */
552
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
553
+#endif
554
+#endif
555
+
556
+#define DEFAULT_CODE_GEN_BUFFER_SIZE \
557
+ (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
558
+ ? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE)
559
+
560
+static size_t size_code_gen_buffer(size_t tb_size)
561
+{
562
+ /* Size the buffer. */
563
+ if (tb_size == 0) {
564
+ size_t phys_mem = qemu_get_host_physmem();
565
+ if (phys_mem == 0) {
566
+ tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
567
+ } else {
568
+ tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8);
569
+ }
570
+ }
571
+ if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
572
+ tb_size = MIN_CODE_GEN_BUFFER_SIZE;
573
+ }
574
+ if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
575
+ tb_size = MAX_CODE_GEN_BUFFER_SIZE;
576
+ }
577
+ return tb_size;
578
+}
579
+
580
+#ifdef __mips__
581
+/*
582
+ * In order to use J and JAL within the code_gen_buffer, we require
583
+ * that the buffer not cross a 256MB boundary.
584
+ */
585
+static inline bool cross_256mb(void *addr, size_t size)
586
+{
587
+ return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & ~0x0ffffffful;
588
+}
589
+
590
+/*
591
+ * We weren't able to allocate a buffer without crossing that boundary,
592
+ * so make do with the larger portion of the buffer that doesn't cross.
593
+ * Returns the new base of the buffer, and adjusts code_gen_buffer_size.
594
+ */
595
+static inline void *split_cross_256mb(void *buf1, size_t size1)
596
+{
597
+ void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffffffful);
598
+ size_t size2 = buf1 + size1 - buf2;
599
+
600
+ size1 = buf2 - buf1;
601
+ if (size1 < size2) {
602
+ size1 = size2;
603
+ buf1 = buf2;
604
+ }
605
+
606
+ tcg_ctx->code_gen_buffer_size = size1;
607
+ return buf1;
608
+}
609
+#endif
610
+
611
+#ifdef USE_STATIC_CODE_GEN_BUFFER
612
+static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
613
+ __attribute__((aligned(CODE_GEN_ALIGN)));
614
+
615
+static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
616
+{
617
+ void *buf, *end;
618
+ size_t size;
619
+
620
+ if (splitwx > 0) {
621
+ error_setg(errp, "jit split-wx not supported");
622
+ return false;
623
+ }
624
+
625
+ /* page-align the beginning and end of the buffer */
626
+ buf = static_code_gen_buffer;
627
+ end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
628
+ buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
629
+ end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
630
+
631
+ size = end - buf;
632
+
633
+ /* Honor a command-line option limiting the size of the buffer. */
634
+ if (size > tb_size) {
635
+ size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size);
636
+ }
637
+ tcg_ctx->code_gen_buffer_size = size;
638
+
639
+#ifdef __mips__
640
+ if (cross_256mb(buf, size)) {
641
+ buf = split_cross_256mb(buf, size);
642
+ size = tcg_ctx->code_gen_buffer_size;
643
+ }
644
+#endif
645
+
646
+ if (qemu_mprotect_rwx(buf, size)) {
647
+ error_setg_errno(errp, errno, "mprotect of jit buffer");
648
+ return false;
649
+ }
650
+ qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
651
+
652
+ tcg_ctx->code_gen_buffer = buf;
653
+ return true;
654
+}
655
+#elif defined(_WIN32)
656
+static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
657
+{
658
+ void *buf;
659
+
660
+ if (splitwx > 0) {
661
+ error_setg(errp, "jit split-wx not supported");
662
+ return false;
663
+ }
664
+
665
+ buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
666
+ PAGE_EXECUTE_READWRITE);
667
+ if (buf == NULL) {
668
+ error_setg_win32(errp, GetLastError(),
669
+ "allocate %zu bytes for jit buffer", size);
670
+ return false;
671
+ }
672
+
673
+ tcg_ctx->code_gen_buffer = buf;
674
+ tcg_ctx->code_gen_buffer_size = size;
675
+ return true;
676
+}
677
+#else
678
+static bool alloc_code_gen_buffer_anon(size_t size, int prot,
679
+ int flags, Error **errp)
680
+{
681
+ void *buf;
682
+
683
+ buf = mmap(NULL, size, prot, flags, -1, 0);
684
+ if (buf == MAP_FAILED) {
685
+ error_setg_errno(errp, errno,
686
+ "allocate %zu bytes for jit buffer", size);
687
+ return false;
688
+ }
689
+ tcg_ctx->code_gen_buffer_size = size;
690
+
691
+#ifdef __mips__
692
+ if (cross_256mb(buf, size)) {
693
+ /*
694
+ * Try again, with the original still mapped, to avoid re-acquiring
695
+ * the same 256mb crossing.
696
+ */
697
+ size_t size2;
698
+ void *buf2 = mmap(NULL, size, prot, flags, -1, 0);
699
+ switch ((int)(buf2 != MAP_FAILED)) {
700
+ case 1:
701
+ if (!cross_256mb(buf2, size)) {
702
+ /* Success! Use the new buffer. */
703
+ munmap(buf, size);
704
+ break;
705
+ }
706
+ /* Failure. Work with what we had. */
707
+ munmap(buf2, size);
708
+ /* fallthru */
709
+ default:
710
+ /* Split the original buffer. Free the smaller half. */
711
+ buf2 = split_cross_256mb(buf, size);
712
+ size2 = tcg_ctx->code_gen_buffer_size;
713
+ if (buf == buf2) {
714
+ munmap(buf + size2, size - size2);
715
+ } else {
716
+ munmap(buf, size - size2);
717
+ }
718
+ size = size2;
719
+ break;
720
+ }
721
+ buf = buf2;
722
+ }
723
+#endif
724
+
725
+ /* Request large pages for the buffer. */
726
+ qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
727
+
728
+ tcg_ctx->code_gen_buffer = buf;
729
+ return true;
730
+}
731
+
732
+#ifndef CONFIG_TCG_INTERPRETER
733
+#ifdef CONFIG_POSIX
734
+#include "qemu/memfd.h"
735
+
736
+static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
737
+{
738
+ void *buf_rw = NULL, *buf_rx = MAP_FAILED;
739
+ int fd = -1;
740
+
741
+#ifdef __mips__
742
+ /* Find space for the RX mapping, vs the 256MiB regions. */
743
+ if (!alloc_code_gen_buffer_anon(size, PROT_NONE,
744
+ MAP_PRIVATE | MAP_ANONYMOUS |
745
+ MAP_NORESERVE, errp)) {
746
+ return false;
747
+ }
748
+ /* The size of the mapping may have been adjusted. */
749
+ size = tcg_ctx->code_gen_buffer_size;
750
+ buf_rx = tcg_ctx->code_gen_buffer;
751
+#endif
752
+
753
+ buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp);
754
+ if (buf_rw == NULL) {
755
+ goto fail;
756
+ }
757
+
758
+#ifdef __mips__
759
+ void *tmp = mmap(buf_rx, size, PROT_READ | PROT_EXEC,
760
+ MAP_SHARED | MAP_FIXED, fd, 0);
761
+ if (tmp != buf_rx) {
762
+ goto fail_rx;
763
+ }
764
+#else
765
+ buf_rx = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
766
+ if (buf_rx == MAP_FAILED) {
767
+ goto fail_rx;
768
+ }
769
+#endif
770
+
771
+ close(fd);
772
+ tcg_ctx->code_gen_buffer = buf_rw;
773
+ tcg_ctx->code_gen_buffer_size = size;
774
+ tcg_splitwx_diff = buf_rx - buf_rw;
775
+
776
+ /* Request large pages for the buffer and the splitwx. */
777
+ qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
778
+ qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
779
+ return true;
780
+
781
+ fail_rx:
782
+ error_setg_errno(errp, errno, "failed to map shared memory for execute");
783
+ fail:
784
+ if (buf_rx != MAP_FAILED) {
785
+ munmap(buf_rx, size);
786
+ }
787
+ if (buf_rw) {
788
+ munmap(buf_rw, size);
789
+ }
790
+ if (fd >= 0) {
791
+ close(fd);
792
+ }
793
+ return false;
794
+}
795
+#endif /* CONFIG_POSIX */
796
+
797
+#ifdef CONFIG_DARWIN
798
+#include <mach/mach.h>
799
+
800
+extern kern_return_t mach_vm_remap(vm_map_t target_task,
801
+ mach_vm_address_t *target_address,
802
+ mach_vm_size_t size,
803
+ mach_vm_offset_t mask,
804
+ int flags,
805
+ vm_map_t src_task,
806
+ mach_vm_address_t src_address,
807
+ boolean_t copy,
808
+ vm_prot_t *cur_protection,
809
+ vm_prot_t *max_protection,
810
+ vm_inherit_t inheritance);
811
+
812
+static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
813
+{
814
+ kern_return_t ret;
815
+ mach_vm_address_t buf_rw, buf_rx;
816
+ vm_prot_t cur_prot, max_prot;
817
+
818
+ /* Map the read-write portion via normal anon memory. */
819
+ if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE,
820
+ MAP_PRIVATE | MAP_ANONYMOUS, errp)) {
821
+ return false;
822
+ }
823
+
824
+ buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;
825
+ buf_rx = 0;
826
+ ret = mach_vm_remap(mach_task_self(),
827
+ &buf_rx,
828
+ size,
829
+ 0,
830
+ VM_FLAGS_ANYWHERE,
831
+ mach_task_self(),
832
+ buf_rw,
833
+ false,
834
+ &cur_prot,
835
+ &max_prot,
836
+ VM_INHERIT_NONE);
837
+ if (ret != KERN_SUCCESS) {
838
+ /* TODO: Convert "ret" to a human readable error message. */
839
+ error_setg(errp, "vm_remap for jit splitwx failed");
840
+ munmap((void *)buf_rw, size);
841
+ return false;
842
+ }
843
+
844
+ if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {
845
+ error_setg_errno(errp, errno, "mprotect for jit splitwx");
846
+ munmap((void *)buf_rx, size);
847
+ munmap((void *)buf_rw, size);
848
+ return false;
849
+ }
850
+
851
+ tcg_splitwx_diff = buf_rx - buf_rw;
852
+ return true;
853
+}
854
+#endif /* CONFIG_DARWIN */
855
+#endif /* CONFIG_TCG_INTERPRETER */
856
+
857
+static bool alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
858
+{
859
+#ifndef CONFIG_TCG_INTERPRETER
860
+# ifdef CONFIG_DARWIN
861
+ return alloc_code_gen_buffer_splitwx_vmremap(size, errp);
862
+# endif
863
+# ifdef CONFIG_POSIX
864
+ return alloc_code_gen_buffer_splitwx_memfd(size, errp);
865
+# endif
866
+#endif
867
+ error_setg(errp, "jit split-wx not supported");
868
+ return false;
869
+}
870
+
871
+static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
872
+{
873
+ ERRP_GUARD();
874
+ int prot, flags;
875
+
876
+ if (splitwx) {
877
+ if (alloc_code_gen_buffer_splitwx(size, errp)) {
878
+ return true;
879
+ }
880
+ /*
881
+ * If splitwx force-on (1), fail;
882
+ * if splitwx default-on (-1), fall through to splitwx off.
883
+ */
884
+ if (splitwx > 0) {
885
+ return false;
886
+ }
887
+ error_free_or_abort(errp);
888
+ }
889
+
890
+ prot = PROT_READ | PROT_WRITE | PROT_EXEC;
891
+ flags = MAP_PRIVATE | MAP_ANONYMOUS;
892
+#ifdef CONFIG_TCG_INTERPRETER
893
+ /* The tcg interpreter does not need execute permission. */
894
+ prot = PROT_READ | PROT_WRITE;
895
+#elif defined(CONFIG_DARWIN)
896
+ /* Applicable to both iOS and macOS (Apple Silicon). */
897
+ if (!splitwx) {
898
+ flags |= MAP_JIT;
899
+ }
900
+#endif
901
+
902
+ return alloc_code_gen_buffer_anon(size, prot, flags, errp);
903
+}
904
+#endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
905
+
906
/*
907
* Initializes region partitioning.
908
*
909
@@ -XXX,XX +XXX,XX @@ static size_t tcg_n_regions(void)
910
* in practice. Multi-threaded guests share most if not all of their translated
911
* code, which makes parallel code generation less appealing than in softmmu.
912
*/
913
-void tcg_region_init(void)
914
+void tcg_region_init(size_t tb_size, int splitwx)
915
{
916
- void *buf = tcg_init_ctx.code_gen_buffer;
917
- void *aligned;
918
- size_t size = tcg_init_ctx.code_gen_buffer_size;
919
- size_t page_size = qemu_real_host_page_size;
920
+ void *buf, *aligned;
921
+ size_t size;
922
+ size_t page_size;
923
size_t region_size;
924
size_t n_regions;
925
size_t i;
926
+ bool ok;
927
928
+ ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
929
+ splitwx, &error_fatal);
930
+ assert(ok);
931
+
932
+ buf = tcg_init_ctx.code_gen_buffer;
933
+ size = tcg_init_ctx.code_gen_buffer_size;
934
+ page_size = qemu_real_host_page_size;
935
n_regions = tcg_n_regions();
936
937
/* The first region will be 'aligned - buf' bytes larger than the others */
938
--
939
2.25.1
940
941
diff view generated by jsdifflib
New patch
1
We shortly want to use tcg_init for something else.
2
Since the hook is called init_machine, match that.
1
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
accel/tcg/tcg-all.c | 4 ++--
10
1 file changed, 2 insertions(+), 2 deletions(-)
11
12
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/accel/tcg/tcg-all.c
15
+++ b/accel/tcg/tcg-all.c
16
@@ -XXX,XX +XXX,XX @@ static void tcg_accel_instance_init(Object *obj)
17
18
bool mttcg_enabled;
19
20
-static int tcg_init(MachineState *ms)
21
+static int tcg_init_machine(MachineState *ms)
22
{
23
TCGState *s = TCG_STATE(current_accel());
24
25
@@ -XXX,XX +XXX,XX @@ static void tcg_accel_class_init(ObjectClass *oc, void *data)
26
{
27
AccelClass *ac = ACCEL_CLASS(oc);
28
ac->name = "tcg";
29
- ac->init_machine = tcg_init;
30
+ ac->init_machine = tcg_init_machine;
31
ac->allowed = &tcg_allowed;
32
33
object_class_property_add_str(oc, "thread",
34
--
35
2.25.1
36
37
diff view generated by jsdifflib
1
We will want to be able to flush a tlb without resizing.
1
Perform both tcg_context_init and tcg_region_init.
2
Do not leave this split to the caller.
2
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
3
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
4
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
5
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
7
---
8
accel/tcg/cputlb.c | 15 ++++++++++-----
8
include/tcg/tcg.h | 3 +--
9
1 file changed, 10 insertions(+), 5 deletions(-)
9
tcg/tcg-internal.h | 1 +
10
accel/tcg/translate-all.c | 3 +--
11
tcg/tcg.c | 9 ++++++++-
12
4 files changed, 11 insertions(+), 5 deletions(-)
10
13
11
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
14
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
12
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
13
--- a/accel/tcg/cputlb.c
16
--- a/include/tcg/tcg.h
14
+++ b/accel/tcg/cputlb.c
17
+++ b/include/tcg/tcg.h
15
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
18
@@ -XXX,XX +XXX,XX @@ void *tcg_malloc_internal(TCGContext *s, int size);
19
void tcg_pool_reset(TCGContext *s);
20
TranslationBlock *tcg_tb_alloc(TCGContext *s);
21
22
-void tcg_region_init(size_t tb_size, int splitwx);
23
void tb_destroy(TranslationBlock *tb);
24
void tcg_region_reset_all(void);
25
26
@@ -XXX,XX +XXX,XX @@ static inline void *tcg_malloc(int size)
16
}
27
}
17
}
28
}
18
29
19
-static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
30
-void tcg_context_init(TCGContext *s);
20
+static void tlb_mmu_flush_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
31
+void tcg_init(size_t tb_size, int splitwx);
32
void tcg_register_thread(void);
33
void tcg_prologue_init(TCGContext *s);
34
void tcg_func_start(TCGContext *s);
35
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
36
index XXXXXXX..XXXXXXX 100644
37
--- a/tcg/tcg-internal.h
38
+++ b/tcg/tcg-internal.h
39
@@ -XXX,XX +XXX,XX @@
40
extern TCGContext **tcg_ctxs;
41
extern unsigned int n_tcg_ctxs;
42
43
+void tcg_region_init(size_t tb_size, int splitwx);
44
bool tcg_region_alloc(TCGContext *s);
45
void tcg_region_initial_alloc(TCGContext *s);
46
void tcg_region_prologue_set(TCGContext *s);
47
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/accel/tcg/translate-all.c
50
+++ b/accel/tcg/translate-all.c
51
@@ -XXX,XX +XXX,XX @@ static void tb_htable_init(void)
52
void tcg_exec_init(unsigned long tb_size, int splitwx)
21
{
53
{
22
- CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
54
tcg_allowed = true;
23
- CPUTLBDescFast *fast = &env_tlb(env)->f[mmu_idx];
55
- tcg_context_init(&tcg_init_ctx);
24
-
56
page_init();
25
- tlb_mmu_resize_locked(desc, fast);
57
tb_htable_init();
26
desc->n_used_entries = 0;
58
- tcg_region_init(tb_size, splitwx);
27
desc->large_page_addr = -1;
59
+ tcg_init(tb_size, splitwx);
28
desc->large_page_mask = -1;
60
29
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
61
#if defined(CONFIG_SOFTMMU)
30
memset(desc->vtable, -1, sizeof(desc->vtable));
62
/* There's no guest base to take into account, so go ahead and
63
diff --git a/tcg/tcg.c b/tcg/tcg.c
64
index XXXXXXX..XXXXXXX 100644
65
--- a/tcg/tcg.c
66
+++ b/tcg/tcg.c
67
@@ -XXX,XX +XXX,XX @@ static void process_op_defs(TCGContext *s);
68
static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
69
TCGReg reg, const char *name);
70
71
-void tcg_context_init(TCGContext *s)
72
+static void tcg_context_init(void)
73
{
74
+ TCGContext *s = &tcg_init_ctx;
75
int op, total_args, n, i;
76
TCGOpDef *def;
77
TCGArgConstraint *args_ct;
78
@@ -XXX,XX +XXX,XX @@ void tcg_context_init(TCGContext *s)
79
cpu_env = temp_tcgv_ptr(ts);
31
}
80
}
32
81
33
+static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
82
+void tcg_init(size_t tb_size, int splitwx)
34
+{
83
+{
35
+ CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
84
+ tcg_context_init();
36
+ CPUTLBDescFast *fast = &env_tlb(env)->f[mmu_idx];
85
+ tcg_region_init(tb_size, splitwx);
37
+
38
+ tlb_mmu_resize_locked(desc, fast);
39
+ tlb_mmu_flush_locked(desc, fast);
40
+}
86
+}
41
+
87
+
42
static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)
88
/*
43
{
89
* Allocate TBs right before their corresponding translated code, making
44
env_tlb(env)->d[mmu_idx].n_used_entries++;
90
* sure that TBs and code are on different cache lines.
45
--
91
--
46
2.20.1
92
2.25.1
47
93
48
94
diff view generated by jsdifflib
1
There is only one caller for tlb_table_flush_by_mmuidx. Place
1
There is only one caller, and shortly we will need access
2
the result at the earlier line number, due to an expected user
2
to the MachineState, which tcg_init_machine already has.
3
in the near future.
4
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
7
---
9
accel/tcg/cputlb.c | 19 +++++++------------
8
accel/tcg/internal.h | 2 ++
10
1 file changed, 7 insertions(+), 12 deletions(-)
9
include/sysemu/tcg.h | 2 --
10
accel/tcg/tcg-all.c | 16 +++++++++++++++-
11
accel/tcg/translate-all.c | 21 ++-------------------
12
bsd-user/main.c | 2 +-
13
5 files changed, 20 insertions(+), 23 deletions(-)
11
14
12
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
15
diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
13
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
14
--- a/accel/tcg/cputlb.c
17
--- a/accel/tcg/internal.h
15
+++ b/accel/tcg/cputlb.c
18
+++ b/accel/tcg/internal.h
16
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx)
19
@@ -XXX,XX +XXX,XX @@ TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
17
}
20
int cflags);
21
22
void QEMU_NORETURN cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
23
+void page_init(void);
24
+void tb_htable_init(void);
25
26
#endif /* ACCEL_TCG_INTERNAL_H */
27
diff --git a/include/sysemu/tcg.h b/include/sysemu/tcg.h
28
index XXXXXXX..XXXXXXX 100644
29
--- a/include/sysemu/tcg.h
30
+++ b/include/sysemu/tcg.h
31
@@ -XXX,XX +XXX,XX @@
32
#ifndef SYSEMU_TCG_H
33
#define SYSEMU_TCG_H
34
35
-void tcg_exec_init(unsigned long tb_size, int splitwx);
36
-
37
#ifdef CONFIG_TCG
38
extern bool tcg_allowed;
39
#define tcg_enabled() (tcg_allowed)
40
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/accel/tcg/tcg-all.c
43
+++ b/accel/tcg/tcg-all.c
44
@@ -XXX,XX +XXX,XX @@
45
#include "qemu/error-report.h"
46
#include "qemu/accel.h"
47
#include "qapi/qapi-builtin-visit.h"
48
+#include "internal.h"
49
50
struct TCGState {
51
AccelState parent_obj;
52
@@ -XXX,XX +XXX,XX @@ static int tcg_init_machine(MachineState *ms)
53
{
54
TCGState *s = TCG_STATE(current_accel());
55
56
- tcg_exec_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
57
+ tcg_allowed = true;
58
mttcg_enabled = s->mttcg_enabled;
59
+
60
+ page_init();
61
+ tb_htable_init();
62
+ tcg_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
63
+
64
+#if defined(CONFIG_SOFTMMU)
65
+ /*
66
+ * There's no guest base to take into account, so go ahead and
67
+ * initialize the prologue now.
68
+ */
69
+ tcg_prologue_init(tcg_ctx);
70
+#endif
71
+
72
return 0;
18
}
73
}
19
74
20
-static inline void tlb_table_flush_by_mmuidx(CPUArchState *env, int mmu_idx)
75
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
21
+static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
76
index XXXXXXX..XXXXXXX 100644
77
--- a/accel/tcg/translate-all.c
78
+++ b/accel/tcg/translate-all.c
79
@@ -XXX,XX +XXX,XX @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, bool will_exit)
80
return false;
81
}
82
83
-static void page_init(void)
84
+void page_init(void)
22
{
85
{
23
tlb_mmu_resize_locked(env, mmu_idx);
86
page_size_init();
24
- memset(env_tlb(env)->f[mmu_idx].table, -1, sizeof_tlb(env, mmu_idx));
87
page_table_config_init();
25
env_tlb(env)->d[mmu_idx].n_used_entries = 0;
88
@@ -XXX,XX +XXX,XX @@ static bool tb_cmp(const void *ap, const void *bp)
26
+ env_tlb(env)->d[mmu_idx].large_page_addr = -1;
89
a->page_addr[1] == b->page_addr[1];
27
+ env_tlb(env)->d[mmu_idx].large_page_mask = -1;
28
+ env_tlb(env)->d[mmu_idx].vindex = 0;
29
+ memset(env_tlb(env)->f[mmu_idx].table, -1, sizeof_tlb(env, mmu_idx));
30
+ memset(env_tlb(env)->d[mmu_idx].vtable, -1,
31
+ sizeof(env_tlb(env)->d[0].vtable));
32
}
90
}
33
91
34
static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)
92
-static void tb_htable_init(void)
35
@@ -XXX,XX +XXX,XX @@ void tlb_flush_counts(size_t *pfull, size_t *ppart, size_t *pelide)
93
+void tb_htable_init(void)
36
*pelide = elide;
94
{
95
unsigned int mode = QHT_MODE_AUTO_RESIZE;
96
97
qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
37
}
98
}
38
99
39
-static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
100
-/* Must be called before using the QEMU cpus. 'tb_size' is the size
101
- (in bytes) allocated to the translation buffer. Zero means default
102
- size. */
103
-void tcg_exec_init(unsigned long tb_size, int splitwx)
40
-{
104
-{
41
- tlb_table_flush_by_mmuidx(env, mmu_idx);
105
- tcg_allowed = true;
42
- env_tlb(env)->d[mmu_idx].large_page_addr = -1;
106
- page_init();
43
- env_tlb(env)->d[mmu_idx].large_page_mask = -1;
107
- tb_htable_init();
44
- env_tlb(env)->d[mmu_idx].vindex = 0;
108
- tcg_init(tb_size, splitwx);
45
- memset(env_tlb(env)->d[mmu_idx].vtable, -1,
109
-
46
- sizeof(env_tlb(env)->d[0].vtable));
110
-#if defined(CONFIG_SOFTMMU)
111
- /* There's no guest base to take into account, so go ahead and
112
- initialize the prologue now. */
113
- tcg_prologue_init(tcg_ctx);
114
-#endif
47
-}
115
-}
48
-
116
-
49
static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
117
/* call with @p->lock held */
118
static inline void invalidate_page_bitmap(PageDesc *p)
50
{
119
{
51
CPUArchState *env = cpu->env_ptr;
120
diff --git a/bsd-user/main.c b/bsd-user/main.c
121
index XXXXXXX..XXXXXXX 100644
122
--- a/bsd-user/main.c
123
+++ b/bsd-user/main.c
124
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
125
envlist_free(envlist);
126
127
/*
128
- * Now that page sizes are configured in tcg_exec_init() we can do
129
+ * Now that page sizes are configured we can do
130
* proper page alignment for guest_base.
131
*/
132
guest_base = HOST_PAGE_ALIGN(guest_base);
52
--
133
--
53
2.20.1
134
2.25.1
54
135
55
136
diff view generated by jsdifflib
New patch
1
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
3
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
---
6
accel/tcg/tcg-all.c | 3 ++-
7
1 file changed, 2 insertions(+), 1 deletion(-)
1
8
9
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
10
index XXXXXXX..XXXXXXX 100644
11
--- a/accel/tcg/tcg-all.c
12
+++ b/accel/tcg/tcg-all.c
13
@@ -XXX,XX +XXX,XX @@
14
#include "qemu/error-report.h"
15
#include "qemu/accel.h"
16
#include "qapi/qapi-builtin-visit.h"
17
+#include "qemu/units.h"
18
#include "internal.h"
19
20
struct TCGState {
21
@@ -XXX,XX +XXX,XX @@ static int tcg_init_machine(MachineState *ms)
22
23
page_init();
24
tb_htable_init();
25
- tcg_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
26
+ tcg_init(s->tb_size * MiB, s->splitwx_enabled);
27
28
#if defined(CONFIG_SOFTMMU)
29
/*
30
--
31
2.25.1
32
33
diff view generated by jsdifflib
1
Merge into the only caller, but at the same time split
1
Start removing the include of hw/boards.h from tcg/.
2
out tlb_mmu_init to initialize a single tlb entry.
2
Pass down the max_cpus value from tcg_init_machine,
3
where we have the MachineState already.
3
4
5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
---
9
accel/tcg/cputlb.c | 33 ++++++++++++++++-----------------
10
include/tcg/tcg.h | 2 +-
10
1 file changed, 16 insertions(+), 17 deletions(-)
11
tcg/tcg-internal.h | 2 +-
12
accel/tcg/tcg-all.c | 10 +++++++++-
13
tcg/region.c | 32 +++++++++++---------------------
14
tcg/tcg.c | 10 ++++------
15
5 files changed, 26 insertions(+), 30 deletions(-)
11
16
12
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
17
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
13
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
14
--- a/accel/tcg/cputlb.c
19
--- a/include/tcg/tcg.h
15
+++ b/accel/tcg/cputlb.c
20
+++ b/include/tcg/tcg.h
16
@@ -XXX,XX +XXX,XX @@ static void tlb_window_reset(CPUTLBDesc *desc, int64_t ns,
21
@@ -XXX,XX +XXX,XX @@ static inline void *tcg_malloc(int size)
17
desc->window_max_entries = max_entries;
22
}
18
}
23
}
19
24
20
-static void tlb_dyn_init(CPUArchState *env)
25
-void tcg_init(size_t tb_size, int splitwx);
26
+void tcg_init(size_t tb_size, int splitwx, unsigned max_cpus);
27
void tcg_register_thread(void);
28
void tcg_prologue_init(TCGContext *s);
29
void tcg_func_start(TCGContext *s);
30
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
31
index XXXXXXX..XXXXXXX 100644
32
--- a/tcg/tcg-internal.h
33
+++ b/tcg/tcg-internal.h
34
@@ -XXX,XX +XXX,XX @@
35
extern TCGContext **tcg_ctxs;
36
extern unsigned int n_tcg_ctxs;
37
38
-void tcg_region_init(size_t tb_size, int splitwx);
39
+void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus);
40
bool tcg_region_alloc(TCGContext *s);
41
void tcg_region_initial_alloc(TCGContext *s);
42
void tcg_region_prologue_set(TCGContext *s);
43
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
44
index XXXXXXX..XXXXXXX 100644
45
--- a/accel/tcg/tcg-all.c
46
+++ b/accel/tcg/tcg-all.c
47
@@ -XXX,XX +XXX,XX @@
48
#include "qemu/accel.h"
49
#include "qapi/qapi-builtin-visit.h"
50
#include "qemu/units.h"
51
+#if !defined(CONFIG_USER_ONLY)
52
+#include "hw/boards.h"
53
+#endif
54
#include "internal.h"
55
56
struct TCGState {
57
@@ -XXX,XX +XXX,XX @@ bool mttcg_enabled;
58
static int tcg_init_machine(MachineState *ms)
59
{
60
TCGState *s = TCG_STATE(current_accel());
61
+#ifdef CONFIG_USER_ONLY
62
+ unsigned max_cpus = 1;
63
+#else
64
+ unsigned max_cpus = ms->smp.max_cpus;
65
+#endif
66
67
tcg_allowed = true;
68
mttcg_enabled = s->mttcg_enabled;
69
70
page_init();
71
tb_htable_init();
72
- tcg_init(s->tb_size * MiB, s->splitwx_enabled);
73
+ tcg_init(s->tb_size * MiB, s->splitwx_enabled, max_cpus);
74
75
#if defined(CONFIG_SOFTMMU)
76
/*
77
diff --git a/tcg/region.c b/tcg/region.c
78
index XXXXXXX..XXXXXXX 100644
79
--- a/tcg/region.c
80
+++ b/tcg/region.c
81
@@ -XXX,XX +XXX,XX @@
82
#include "qapi/error.h"
83
#include "exec/exec-all.h"
84
#include "tcg/tcg.h"
85
-#if !defined(CONFIG_USER_ONLY)
86
-#include "hw/boards.h"
87
-#endif
88
#include "tcg-internal.h"
89
90
91
@@ -XXX,XX +XXX,XX @@ void tcg_region_reset_all(void)
92
tcg_region_tree_reset_all();
93
}
94
95
+static size_t tcg_n_regions(unsigned max_cpus)
96
+{
97
#ifdef CONFIG_USER_ONLY
98
-static size_t tcg_n_regions(void)
21
-{
99
-{
22
- int i;
100
return 1;
23
-
24
- for (i = 0; i < NB_MMU_MODES; i++) {
25
- CPUTLBDesc *desc = &env_tlb(env)->d[i];
26
- size_t n_entries = 1 << CPU_TLB_DYN_DEFAULT_BITS;
27
-
28
- tlb_window_reset(desc, get_clock_realtime(), 0);
29
- desc->n_used_entries = 0;
30
- env_tlb(env)->f[i].mask = (n_entries - 1) << CPU_TLB_ENTRY_BITS;
31
- env_tlb(env)->f[i].table = g_new(CPUTLBEntry, n_entries);
32
- env_tlb(env)->d[i].iotlb = g_new(CPUIOTLBEntry, n_entries);
33
- }
34
-}
101
-}
35
-
102
#else
36
/**
103
-/*
37
* tlb_mmu_resize_locked() - perform TLB resize bookkeeping; resize if necessary
104
- * It is likely that some vCPUs will translate more code than others, so we
38
* @desc: The CPUTLBDesc portion of the TLB
105
- * first try to set more regions than max_cpus, with those regions being of
39
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
106
- * reasonable size. If that's not possible we make do by evenly dividing
40
tlb_mmu_flush_locked(desc, fast);
107
- * the code_gen_buffer among the vCPUs.
108
- */
109
-static size_t tcg_n_regions(void)
110
-{
111
+ /*
112
+ * It is likely that some vCPUs will translate more code than others,
113
+ * so we first try to set more regions than max_cpus, with those regions
114
+ * being of reasonable size. If that's not possible we make do by evenly
115
+ * dividing the code_gen_buffer among the vCPUs.
116
+ */
117
size_t i;
118
119
/* Use a single region if all we have is one vCPU thread */
120
-#if !defined(CONFIG_USER_ONLY)
121
- MachineState *ms = MACHINE(qdev_get_machine());
122
- unsigned int max_cpus = ms->smp.max_cpus;
123
-#endif
124
if (max_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
125
return 1;
126
}
127
@@ -XXX,XX +XXX,XX @@ static size_t tcg_n_regions(void)
128
}
129
/* If we can't, then just allocate one region per vCPU thread */
130
return max_cpus;
131
-}
132
#endif
133
+}
134
135
/*
136
* Minimum size of the code gen buffer. This number is randomly chosen,
137
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
138
* in practice. Multi-threaded guests share most if not all of their translated
139
* code, which makes parallel code generation less appealing than in softmmu.
140
*/
141
-void tcg_region_init(size_t tb_size, int splitwx)
142
+void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
143
{
144
void *buf, *aligned;
145
size_t size;
146
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx)
147
buf = tcg_init_ctx.code_gen_buffer;
148
size = tcg_init_ctx.code_gen_buffer_size;
149
page_size = qemu_real_host_page_size;
150
- n_regions = tcg_n_regions();
151
+ n_regions = tcg_n_regions(max_cpus);
152
153
/* The first region will be 'aligned - buf' bytes larger than the others */
154
aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
155
diff --git a/tcg/tcg.c b/tcg/tcg.c
156
index XXXXXXX..XXXXXXX 100644
157
--- a/tcg/tcg.c
158
+++ b/tcg/tcg.c
159
@@ -XXX,XX +XXX,XX @@ static void process_op_defs(TCGContext *s);
160
static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
161
TCGReg reg, const char *name);
162
163
-static void tcg_context_init(void)
164
+static void tcg_context_init(unsigned max_cpus)
165
{
166
TCGContext *s = &tcg_init_ctx;
167
int op, total_args, n, i;
168
@@ -XXX,XX +XXX,XX @@ static void tcg_context_init(void)
169
tcg_ctxs = &tcg_ctx;
170
n_tcg_ctxs = 1;
171
#else
172
- MachineState *ms = MACHINE(qdev_get_machine());
173
- unsigned int max_cpus = ms->smp.max_cpus;
174
tcg_ctxs = g_new(TCGContext *, max_cpus);
175
#endif
176
177
@@ -XXX,XX +XXX,XX @@ static void tcg_context_init(void)
178
cpu_env = temp_tcgv_ptr(ts);
41
}
179
}
42
180
43
+static void tlb_mmu_init(CPUTLBDesc *desc, CPUTLBDescFast *fast, int64_t now)
181
-void tcg_init(size_t tb_size, int splitwx)
44
+{
182
+void tcg_init(size_t tb_size, int splitwx, unsigned max_cpus)
45
+ size_t n_entries = 1 << CPU_TLB_DYN_DEFAULT_BITS;
46
+
47
+ tlb_window_reset(desc, now, 0);
48
+ desc->n_used_entries = 0;
49
+ fast->mask = (n_entries - 1) << CPU_TLB_ENTRY_BITS;
50
+ fast->table = g_new(CPUTLBEntry, n_entries);
51
+ desc->iotlb = g_new(CPUIOTLBEntry, n_entries);
52
+}
53
+
54
static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)
55
{
183
{
56
env_tlb(env)->d[mmu_idx].n_used_entries++;
184
- tcg_context_init();
57
@@ -XXX,XX +XXX,XX @@ static inline void tlb_n_used_entries_dec(CPUArchState *env, uintptr_t mmu_idx)
185
- tcg_region_init(tb_size, splitwx);
58
void tlb_init(CPUState *cpu)
186
+ tcg_context_init(max_cpus);
59
{
187
+ tcg_region_init(tb_size, splitwx, max_cpus);
60
CPUArchState *env = cpu->env_ptr;
61
+ int64_t now = get_clock_realtime();
62
+ int i;
63
64
qemu_spin_init(&env_tlb(env)->c.lock);
65
66
/* Ensure that cpu_reset performs a full flush. */
67
env_tlb(env)->c.dirty = ALL_MMUIDX_BITS;
68
69
- tlb_dyn_init(env);
70
+ for (i = 0; i < NB_MMU_MODES; i++) {
71
+ tlb_mmu_init(&env_tlb(env)->d[i], &env_tlb(env)->f[i], now);
72
+ }
73
}
188
}
74
189
75
/* flush_all_helper: run fn across all cpus
190
/*
76
--
191
--
77
2.20.1
192
2.25.1
78
193
79
194
diff view generated by jsdifflib
New patch
1
Finish the divorce of tcg/ from hw/, and do not take
2
the max cpu value from MachineState; just remember what
3
we were passed in tcg_init.
1
4
5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
7
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
---
10
tcg/tcg-internal.h | 3 ++-
11
tcg/region.c | 6 +++---
12
tcg/tcg.c | 23 ++++++++++-------------
13
3 files changed, 15 insertions(+), 17 deletions(-)
14
15
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
16
index XXXXXXX..XXXXXXX 100644
17
--- a/tcg/tcg-internal.h
18
+++ b/tcg/tcg-internal.h
19
@@ -XXX,XX +XXX,XX @@
20
#define TCG_HIGHWATER 1024
21
22
extern TCGContext **tcg_ctxs;
23
-extern unsigned int n_tcg_ctxs;
24
+extern unsigned int tcg_cur_ctxs;
25
+extern unsigned int tcg_max_ctxs;
26
27
void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus);
28
bool tcg_region_alloc(TCGContext *s);
29
diff --git a/tcg/region.c b/tcg/region.c
30
index XXXXXXX..XXXXXXX 100644
31
--- a/tcg/region.c
32
+++ b/tcg/region.c
33
@@ -XXX,XX +XXX,XX @@ void tcg_region_initial_alloc(TCGContext *s)
34
/* Call from a safe-work context */
35
void tcg_region_reset_all(void)
36
{
37
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
38
+ unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
39
unsigned int i;
40
41
qemu_mutex_lock(&region.lock);
42
@@ -XXX,XX +XXX,XX @@ void tcg_region_prologue_set(TCGContext *s)
43
*/
44
size_t tcg_code_size(void)
45
{
46
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
47
+ unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
48
unsigned int i;
49
size_t total;
50
51
@@ -XXX,XX +XXX,XX @@ size_t tcg_code_capacity(void)
52
53
size_t tcg_tb_phys_invalidate_count(void)
54
{
55
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
56
+ unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
57
unsigned int i;
58
size_t total = 0;
59
60
diff --git a/tcg/tcg.c b/tcg/tcg.c
61
index XXXXXXX..XXXXXXX 100644
62
--- a/tcg/tcg.c
63
+++ b/tcg/tcg.c
64
@@ -XXX,XX +XXX,XX @@
65
#define NO_CPU_IO_DEFS
66
67
#include "exec/exec-all.h"
68
-
69
-#if !defined(CONFIG_USER_ONLY)
70
-#include "hw/boards.h"
71
-#endif
72
-
73
#include "tcg/tcg-op.h"
74
75
#if UINTPTR_MAX == UINT32_MAX
76
@@ -XXX,XX +XXX,XX @@ static int tcg_out_ldst_finalize(TCGContext *s);
77
#endif
78
79
TCGContext **tcg_ctxs;
80
-unsigned int n_tcg_ctxs;
81
+unsigned int tcg_cur_ctxs;
82
+unsigned int tcg_max_ctxs;
83
TCGv_env cpu_env = 0;
84
const void *tcg_code_gen_epilogue;
85
uintptr_t tcg_splitwx_diff;
86
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
87
#else
88
void tcg_register_thread(void)
89
{
90
- MachineState *ms = MACHINE(qdev_get_machine());
91
TCGContext *s = g_malloc(sizeof(*s));
92
unsigned int i, n;
93
94
@@ -XXX,XX +XXX,XX @@ void tcg_register_thread(void)
95
}
96
97
/* Claim an entry in tcg_ctxs */
98
- n = qatomic_fetch_inc(&n_tcg_ctxs);
99
- g_assert(n < ms->smp.max_cpus);
100
+ n = qatomic_fetch_inc(&tcg_cur_ctxs);
101
+ g_assert(n < tcg_max_ctxs);
102
qatomic_set(&tcg_ctxs[n], s);
103
104
if (n > 0) {
105
@@ -XXX,XX +XXX,XX @@ static void tcg_context_init(unsigned max_cpus)
106
*/
107
#ifdef CONFIG_USER_ONLY
108
tcg_ctxs = &tcg_ctx;
109
- n_tcg_ctxs = 1;
110
+ tcg_cur_ctxs = 1;
111
+ tcg_max_ctxs = 1;
112
#else
113
- tcg_ctxs = g_new(TCGContext *, max_cpus);
114
+ tcg_max_ctxs = max_cpus;
115
+ tcg_ctxs = g_new0(TCGContext *, max_cpus);
116
#endif
117
118
tcg_debug_assert(!tcg_regset_test_reg(s->reserved_regs, TCG_AREG0));
119
@@ -XXX,XX +XXX,XX @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
120
static inline
121
void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
122
{
123
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
124
+ unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
125
unsigned int i;
126
127
for (i = 0; i < n_ctxs; i++) {
128
@@ -XXX,XX +XXX,XX @@ void tcg_dump_op_count(void)
129
130
int64_t tcg_cpu_exec_time(void)
131
{
132
- unsigned int n_ctxs = qatomic_read(&n_tcg_ctxs);
133
+ unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
134
unsigned int i;
135
int64_t ret = 0;
136
137
--
138
2.25.1
139
140
diff view generated by jsdifflib
New patch
1
Remove the ifdef ladder and move each define into the
2
appropriate header file.
1
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
8
tcg/aarch64/tcg-target.h | 1 +
9
tcg/arm/tcg-target.h | 1 +
10
tcg/i386/tcg-target.h | 2 ++
11
tcg/mips/tcg-target.h | 6 ++++++
12
tcg/ppc/tcg-target.h | 2 ++
13
tcg/riscv/tcg-target.h | 1 +
14
tcg/s390/tcg-target.h | 3 +++
15
tcg/sparc/tcg-target.h | 1 +
16
tcg/tci/tcg-target.h | 1 +
17
tcg/region.c | 33 +++++----------------------------
18
10 files changed, 23 insertions(+), 28 deletions(-)
19
20
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
21
index XXXXXXX..XXXXXXX 100644
22
--- a/tcg/aarch64/tcg-target.h
23
+++ b/tcg/aarch64/tcg-target.h
24
@@ -XXX,XX +XXX,XX @@
25
26
#define TCG_TARGET_INSN_UNIT_SIZE 4
27
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 24
28
+#define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
29
#undef TCG_TARGET_STACK_GROWSUP
30
31
typedef enum {
32
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
33
index XXXXXXX..XXXXXXX 100644
34
--- a/tcg/arm/tcg-target.h
35
+++ b/tcg/arm/tcg-target.h
36
@@ -XXX,XX +XXX,XX @@ extern int arm_arch;
37
#undef TCG_TARGET_STACK_GROWSUP
38
#define TCG_TARGET_INSN_UNIT_SIZE 4
39
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
40
+#define MAX_CODE_GEN_BUFFER_SIZE UINT32_MAX
41
42
typedef enum {
43
TCG_REG_R0 = 0,
44
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
45
index XXXXXXX..XXXXXXX 100644
46
--- a/tcg/i386/tcg-target.h
47
+++ b/tcg/i386/tcg-target.h
48
@@ -XXX,XX +XXX,XX @@
49
#ifdef __x86_64__
50
# define TCG_TARGET_REG_BITS 64
51
# define TCG_TARGET_NB_REGS 32
52
+# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
53
#else
54
# define TCG_TARGET_REG_BITS 32
55
# define TCG_TARGET_NB_REGS 24
56
+# define MAX_CODE_GEN_BUFFER_SIZE UINT32_MAX
57
#endif
58
59
typedef enum {
60
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
61
index XXXXXXX..XXXXXXX 100644
62
--- a/tcg/mips/tcg-target.h
63
+++ b/tcg/mips/tcg-target.h
64
@@ -XXX,XX +XXX,XX @@
65
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
66
#define TCG_TARGET_NB_REGS 32
67
68
+/*
69
+ * We have a 256MB branch region, but leave room to make sure the
70
+ * main executable is also within that region.
71
+ */
72
+#define MAX_CODE_GEN_BUFFER_SIZE (128 * MiB)
73
+
74
typedef enum {
75
TCG_REG_ZERO = 0,
76
TCG_REG_AT,
77
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
78
index XXXXXXX..XXXXXXX 100644
79
--- a/tcg/ppc/tcg-target.h
80
+++ b/tcg/ppc/tcg-target.h
81
@@ -XXX,XX +XXX,XX @@
82
83
#ifdef _ARCH_PPC64
84
# define TCG_TARGET_REG_BITS 64
85
+# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
86
#else
87
# define TCG_TARGET_REG_BITS 32
88
+# define MAX_CODE_GEN_BUFFER_SIZE (32 * MiB)
89
#endif
90
91
#define TCG_TARGET_NB_REGS 64
92
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
93
index XXXXXXX..XXXXXXX 100644
94
--- a/tcg/riscv/tcg-target.h
95
+++ b/tcg/riscv/tcg-target.h
96
@@ -XXX,XX +XXX,XX @@
97
#define TCG_TARGET_INSN_UNIT_SIZE 4
98
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
99
#define TCG_TARGET_NB_REGS 32
100
+#define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
101
102
typedef enum {
103
TCG_REG_ZERO,
104
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
105
index XXXXXXX..XXXXXXX 100644
106
--- a/tcg/s390/tcg-target.h
107
+++ b/tcg/s390/tcg-target.h
108
@@ -XXX,XX +XXX,XX @@
109
#define TCG_TARGET_INSN_UNIT_SIZE 2
110
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 19
111
112
+/* We have a +- 4GB range on the branches; leave some slop. */
113
+#define MAX_CODE_GEN_BUFFER_SIZE (3 * GiB)
114
+
115
typedef enum TCGReg {
116
TCG_REG_R0 = 0,
117
TCG_REG_R1,
118
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
119
index XXXXXXX..XXXXXXX 100644
120
--- a/tcg/sparc/tcg-target.h
121
+++ b/tcg/sparc/tcg-target.h
122
@@ -XXX,XX +XXX,XX @@
123
#define TCG_TARGET_INSN_UNIT_SIZE 4
124
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
125
#define TCG_TARGET_NB_REGS 32
126
+#define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
127
128
typedef enum {
129
TCG_REG_G0 = 0,
130
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
131
index XXXXXXX..XXXXXXX 100644
132
--- a/tcg/tci/tcg-target.h
133
+++ b/tcg/tci/tcg-target.h
134
@@ -XXX,XX +XXX,XX @@
135
#define TCG_TARGET_INTERPRETER 1
136
#define TCG_TARGET_INSN_UNIT_SIZE 1
137
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
138
+#define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
139
140
#if UINTPTR_MAX == UINT32_MAX
141
# define TCG_TARGET_REG_BITS 32
142
diff --git a/tcg/region.c b/tcg/region.c
143
index XXXXXXX..XXXXXXX 100644
144
--- a/tcg/region.c
145
+++ b/tcg/region.c
146
@@ -XXX,XX +XXX,XX @@ static size_t tcg_n_regions(unsigned max_cpus)
147
/*
148
* Minimum size of the code gen buffer. This number is randomly chosen,
149
* but not so small that we can't have a fair number of TB's live.
150
+ *
151
+ * Maximum size, MAX_CODE_GEN_BUFFER_SIZE, is defined in tcg-target.h.
152
+ * Unless otherwise indicated, this is constrained by the range of
153
+ * direct branches on the host cpu, as used by the TCG implementation
154
+ * of goto_tb.
155
*/
156
#define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB)
157
158
-/*
159
- * Maximum size of the code gen buffer we'd like to use. Unless otherwise
160
- * indicated, this is constrained by the range of direct branches on the
161
- * host cpu, as used by the TCG implementation of goto_tb.
162
- */
163
-#if defined(__x86_64__)
164
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
165
-#elif defined(__sparc__)
166
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
167
-#elif defined(__powerpc64__)
168
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
169
-#elif defined(__powerpc__)
170
-# define MAX_CODE_GEN_BUFFER_SIZE (32 * MiB)
171
-#elif defined(__aarch64__)
172
-# define MAX_CODE_GEN_BUFFER_SIZE (2 * GiB)
173
-#elif defined(__s390x__)
174
- /* We have a +- 4GB range on the branches; leave some slop. */
175
-# define MAX_CODE_GEN_BUFFER_SIZE (3 * GiB)
176
-#elif defined(__mips__)
177
- /*
178
- * We have a 256MB branch region, but leave room to make sure the
179
- * main executable is also within that region.
180
- */
181
-# define MAX_CODE_GEN_BUFFER_SIZE (128 * MiB)
182
-#else
183
-# define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
184
-#endif
185
-
186
#if TCG_TARGET_REG_BITS == 32
187
#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
188
#ifdef CONFIG_USER_ONLY
189
--
190
2.25.1
191
192
diff view generated by jsdifflib
New patch
1
A size is easier to work with than an end point,
2
particularly during initial buffer allocation.
1
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
8
tcg/region.c | 30 ++++++++++++++++++------------
9
1 file changed, 18 insertions(+), 12 deletions(-)
10
11
diff --git a/tcg/region.c b/tcg/region.c
12
index XXXXXXX..XXXXXXX 100644
13
--- a/tcg/region.c
14
+++ b/tcg/region.c
15
@@ -XXX,XX +XXX,XX @@ struct tcg_region_state {
16
/* fields set at init time */
17
void *start;
18
void *start_aligned;
19
- void *end;
20
size_t n;
21
size_t size; /* size of one region */
22
size_t stride; /* .size + guard size */
23
+ size_t total_size; /* size of entire buffer, >= n * stride */
24
25
/* fields protected by the lock */
26
size_t current; /* current region index */
27
@@ -XXX,XX +XXX,XX @@ static void tcg_region_bounds(size_t curr_region, void **pstart, void **pend)
28
if (curr_region == 0) {
29
start = region.start;
30
}
31
+ /* The final region may have a few extra pages due to earlier rounding. */
32
if (curr_region == region.n - 1) {
33
- end = region.end;
34
+ end = region.start_aligned + region.total_size;
35
}
36
37
*pstart = start;
38
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
39
*/
40
void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
41
{
42
- void *buf, *aligned;
43
- size_t size;
44
+ void *buf, *aligned, *end;
45
+ size_t total_size;
46
size_t page_size;
47
size_t region_size;
48
size_t n_regions;
49
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
50
assert(ok);
51
52
buf = tcg_init_ctx.code_gen_buffer;
53
- size = tcg_init_ctx.code_gen_buffer_size;
54
+ total_size = tcg_init_ctx.code_gen_buffer_size;
55
page_size = qemu_real_host_page_size;
56
n_regions = tcg_n_regions(max_cpus);
57
58
/* The first region will be 'aligned - buf' bytes larger than the others */
59
aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
60
- g_assert(aligned < tcg_init_ctx.code_gen_buffer + size);
61
+ g_assert(aligned < tcg_init_ctx.code_gen_buffer + total_size);
62
+
63
/*
64
* Make region_size a multiple of page_size, using aligned as the start.
65
* As a result of this we might end up with a few extra pages at the end of
66
* the buffer; we will assign those to the last region.
67
*/
68
- region_size = (size - (aligned - buf)) / n_regions;
69
+ region_size = (total_size - (aligned - buf)) / n_regions;
70
region_size = QEMU_ALIGN_DOWN(region_size, page_size);
71
72
/* A region must have at least 2 pages; one code, one guard */
73
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
74
region.start = buf;
75
region.start_aligned = aligned;
76
/* page-align the end, since its last page will be a guard page */
77
- region.end = QEMU_ALIGN_PTR_DOWN(buf + size, page_size);
78
+ end = QEMU_ALIGN_PTR_DOWN(buf + total_size, page_size);
79
/* account for that last guard page */
80
- region.end -= page_size;
81
+ end -= page_size;
82
+ total_size = end - aligned;
83
+ region.total_size = total_size;
84
85
/*
86
* Set guard pages in the rw buffer, as that's the one into which
87
@@ -XXX,XX +XXX,XX @@ void tcg_region_prologue_set(TCGContext *s)
88
89
/* Register the balance of the buffer with gdb. */
90
tcg_register_jit(tcg_splitwx_to_rx(region.start),
91
- region.end - region.start);
92
+ region.start_aligned + region.total_size - region.start);
93
}
94
95
/*
96
@@ -XXX,XX +XXX,XX @@ size_t tcg_code_capacity(void)
97
98
/* no need for synchronization; these variables are set at init time */
99
guard_size = region.stride - region.size;
100
- capacity = region.end + guard_size - region.start;
101
- capacity -= region.n * (guard_size + TCG_HIGHWATER);
102
+ capacity = region.total_size;
103
+ capacity -= (region.n - 1) * guard_size;
104
+ capacity -= region.n * TCG_HIGHWATER;
105
+
106
return capacity;
107
}
108
109
--
110
2.25.1
111
112
diff view generated by jsdifflib
New patch
1
Give the field a name reflecting its actual meaning.
1
2
3
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6
---
7
tcg/region.c | 15 ++++++++-------
8
1 file changed, 8 insertions(+), 7 deletions(-)
9
10
diff --git a/tcg/region.c b/tcg/region.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/tcg/region.c
13
+++ b/tcg/region.c
14
@@ -XXX,XX +XXX,XX @@ struct tcg_region_state {
15
QemuMutex lock;
16
17
/* fields set at init time */
18
- void *start;
19
void *start_aligned;
20
+ void *after_prologue;
21
size_t n;
22
size_t size; /* size of one region */
23
size_t stride; /* .size + guard size */
24
@@ -XXX,XX +XXX,XX @@ static void tcg_region_bounds(size_t curr_region, void **pstart, void **pend)
25
end = start + region.size;
26
27
if (curr_region == 0) {
28
- start = region.start;
29
+ start = region.after_prologue;
30
}
31
/* The final region may have a few extra pages due to earlier rounding. */
32
if (curr_region == region.n - 1) {
33
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
34
region.n = n_regions;
35
region.size = region_size - page_size;
36
region.stride = region_size;
37
- region.start = buf;
38
+ region.after_prologue = buf;
39
region.start_aligned = aligned;
40
/* page-align the end, since its last page will be a guard page */
41
end = QEMU_ALIGN_PTR_DOWN(buf + total_size, page_size);
42
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
43
void tcg_region_prologue_set(TCGContext *s)
44
{
45
/* Deduct the prologue from the first region. */
46
- g_assert(region.start == s->code_gen_buffer);
47
- region.start = s->code_ptr;
48
+ g_assert(region.start_aligned == s->code_gen_buffer);
49
+ region.after_prologue = s->code_ptr;
50
51
/* Recompute boundaries of the first region. */
52
tcg_region_assign(s, 0);
53
54
/* Register the balance of the buffer with gdb. */
55
- tcg_register_jit(tcg_splitwx_to_rx(region.start),
56
- region.start_aligned + region.total_size - region.start);
57
+ tcg_register_jit(tcg_splitwx_to_rx(region.after_prologue),
58
+ region.start_aligned + region.total_size -
59
+ region.after_prologue);
60
}
61
62
/*
63
--
64
2.25.1
65
66
diff view generated by jsdifflib
New patch
1
Compute the value using straight division and bounds,
2
rather than a loop. Pass in tb_size rather than reading
3
from tcg_init_ctx.code_gen_buffer_size,
1
4
5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
tcg/region.c | 29 ++++++++++++-----------------
10
1 file changed, 12 insertions(+), 17 deletions(-)
11
12
diff --git a/tcg/region.c b/tcg/region.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/tcg/region.c
15
+++ b/tcg/region.c
16
@@ -XXX,XX +XXX,XX @@ void tcg_region_reset_all(void)
17
tcg_region_tree_reset_all();
18
}
19
20
-static size_t tcg_n_regions(unsigned max_cpus)
21
+static size_t tcg_n_regions(size_t tb_size, unsigned max_cpus)
22
{
23
#ifdef CONFIG_USER_ONLY
24
return 1;
25
#else
26
+ size_t n_regions;
27
+
28
/*
29
* It is likely that some vCPUs will translate more code than others,
30
* so we first try to set more regions than max_cpus, with those regions
31
* being of reasonable size. If that's not possible we make do by evenly
32
* dividing the code_gen_buffer among the vCPUs.
33
*/
34
- size_t i;
35
-
36
/* Use a single region if all we have is one vCPU thread */
37
if (max_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
38
return 1;
39
}
40
41
- /* Try to have more regions than max_cpus, with each region being >= 2 MB */
42
- for (i = 8; i > 0; i--) {
43
- size_t regions_per_thread = i;
44
- size_t region_size;
45
-
46
- region_size = tcg_init_ctx.code_gen_buffer_size;
47
- region_size /= max_cpus * regions_per_thread;
48
-
49
- if (region_size >= 2 * 1024u * 1024) {
50
- return max_cpus * regions_per_thread;
51
- }
52
+ /*
53
+ * Try to have more regions than max_cpus, with each region being >= 2 MB.
54
+ * If we can't, then just allocate one region per vCPU thread.
55
+ */
56
+ n_regions = tb_size / (2 * MiB);
57
+ if (n_regions <= max_cpus) {
58
+ return max_cpus;
59
}
60
- /* If we can't, then just allocate one region per vCPU thread */
61
- return max_cpus;
62
+ return MIN(n_regions, max_cpus * 8);
63
#endif
64
}
65
66
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
67
buf = tcg_init_ctx.code_gen_buffer;
68
total_size = tcg_init_ctx.code_gen_buffer_size;
69
page_size = qemu_real_host_page_size;
70
- n_regions = tcg_n_regions(max_cpus);
71
+ n_regions = tcg_n_regions(total_size, max_cpus);
72
73
/* The first region will be 'aligned - buf' bytes larger than the others */
74
aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
75
--
76
2.25.1
77
78
diff view generated by jsdifflib
New patch
1
Return output buffer and size via output pointer arguments,
2
rather than returning size via tcg_ctx->code_gen_buffer_size.
1
3
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
8
tcg/region.c | 19 +++++++++----------
9
1 file changed, 9 insertions(+), 10 deletions(-)
10
11
diff --git a/tcg/region.c b/tcg/region.c
12
index XXXXXXX..XXXXXXX 100644
13
--- a/tcg/region.c
14
+++ b/tcg/region.c
15
@@ -XXX,XX +XXX,XX @@ static inline bool cross_256mb(void *addr, size_t size)
16
/*
17
* We weren't able to allocate a buffer without crossing that boundary,
18
* so make do with the larger portion of the buffer that doesn't cross.
19
- * Returns the new base of the buffer, and adjusts code_gen_buffer_size.
20
+ * Returns the new base and size of the buffer in *obuf and *osize.
21
*/
22
-static inline void *split_cross_256mb(void *buf1, size_t size1)
23
+static inline void split_cross_256mb(void **obuf, size_t *osize,
24
+ void *buf1, size_t size1)
25
{
26
void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffffffful);
27
size_t size2 = buf1 + size1 - buf2;
28
@@ -XXX,XX +XXX,XX @@ static inline void *split_cross_256mb(void *buf1, size_t size1)
29
buf1 = buf2;
30
}
31
32
- tcg_ctx->code_gen_buffer_size = size1;
33
- return buf1;
34
+ *obuf = buf1;
35
+ *osize = size1;
36
}
37
#endif
38
39
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
40
if (size > tb_size) {
41
size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size);
42
}
43
- tcg_ctx->code_gen_buffer_size = size;
44
45
#ifdef __mips__
46
if (cross_256mb(buf, size)) {
47
- buf = split_cross_256mb(buf, size);
48
- size = tcg_ctx->code_gen_buffer_size;
49
+ split_cross_256mb(&buf, &size, buf, size);
50
}
51
#endif
52
53
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
54
qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
55
56
tcg_ctx->code_gen_buffer = buf;
57
+ tcg_ctx->code_gen_buffer_size = size;
58
return true;
59
}
60
#elif defined(_WIN32)
61
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot,
62
"allocate %zu bytes for jit buffer", size);
63
return false;
64
}
65
- tcg_ctx->code_gen_buffer_size = size;
66
67
#ifdef __mips__
68
if (cross_256mb(buf, size)) {
69
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot,
70
/* fallthru */
71
default:
72
/* Split the original buffer. Free the smaller half. */
73
- buf2 = split_cross_256mb(buf, size);
74
- size2 = tcg_ctx->code_gen_buffer_size;
75
+ split_cross_256mb(&buf2, &size2, buf, size);
76
if (buf == buf2) {
77
munmap(buf + size2, size - size2);
78
} else {
79
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot,
80
qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
81
82
tcg_ctx->code_gen_buffer = buf;
83
+ tcg_ctx->code_gen_buffer_size = size;
84
return true;
85
}
86
87
--
88
2.25.1
89
90
diff view generated by jsdifflib
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
1
Shortly, the full code_gen_buffer will only be visible
2
to region.c, so move in_code_gen_buffer out-of-line.
2
3
3
To avoid scrolling each instruction when reviewing tcg
4
Move the debugging versions of tcg_splitwx_to_{rx,rw}
4
helpers written for the decodetree script, display the
5
to region.c as well, so that the compiler gets to see
5
.decode files (similar to header declarations) before
6
the implementation of in_code_gen_buffer.
6
the C source (implementation of previous declarations).
7
7
8
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
8
This leaves exactly one use of in_code_gen_buffer outside
9
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
9
of region.c, in cpu_restore_state. Which, being on the
10
exception path, is not performance critical.
11
10
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
12
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
11
Message-Id: <20191230082856.30556-1-philmd@redhat.com>
13
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
12
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
14
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
13
---
15
---
14
scripts/git.orderfile | 3 +++
16
include/tcg/tcg.h | 11 +----------
15
1 file changed, 3 insertions(+)
17
tcg/region.c | 34 ++++++++++++++++++++++++++++++++++
18
tcg/tcg.c | 23 -----------------------
19
3 files changed, 35 insertions(+), 33 deletions(-)
16
20
17
diff --git a/scripts/git.orderfile b/scripts/git.orderfile
21
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
18
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
19
--- a/scripts/git.orderfile
23
--- a/include/tcg/tcg.h
20
+++ b/scripts/git.orderfile
24
+++ b/include/tcg/tcg.h
21
@@ -XXX,XX +XXX,XX @@ qga/*.json
25
@@ -XXX,XX +XXX,XX @@ extern const void *tcg_code_gen_epilogue;
22
# headers
26
extern uintptr_t tcg_splitwx_diff;
23
*.h
27
extern TCGv_env cpu_env;
24
28
25
+# decoding tree specification
29
-static inline bool in_code_gen_buffer(const void *p)
26
+*.decode
30
-{
31
- const TCGContext *s = &tcg_init_ctx;
32
- /*
33
- * Much like it is valid to have a pointer to the byte past the
34
- * end of an array (so long as you don't dereference it), allow
35
- * a pointer to the byte past the end of the code gen buffer.
36
- */
37
- return (size_t)(p - s->code_gen_buffer) <= s->code_gen_buffer_size;
38
-}
39
+bool in_code_gen_buffer(const void *p);
40
41
#ifdef CONFIG_DEBUG_TCG
42
const void *tcg_splitwx_to_rx(void *rw);
43
diff --git a/tcg/region.c b/tcg/region.c
44
index XXXXXXX..XXXXXXX 100644
45
--- a/tcg/region.c
46
+++ b/tcg/region.c
47
@@ -XXX,XX +XXX,XX @@ static struct tcg_region_state region;
48
static void *region_trees;
49
static size_t tree_size;
50
51
+bool in_code_gen_buffer(const void *p)
52
+{
53
+ const TCGContext *s = &tcg_init_ctx;
54
+ /*
55
+ * Much like it is valid to have a pointer to the byte past the
56
+ * end of an array (so long as you don't dereference it), allow
57
+ * a pointer to the byte past the end of the code gen buffer.
58
+ */
59
+ return (size_t)(p - s->code_gen_buffer) <= s->code_gen_buffer_size;
60
+}
27
+
61
+
28
# code
62
+#ifdef CONFIG_DEBUG_TCG
29
*.c
63
+const void *tcg_splitwx_to_rx(void *rw)
64
+{
65
+ /* Pass NULL pointers unchanged. */
66
+ if (rw) {
67
+ g_assert(in_code_gen_buffer(rw));
68
+ rw += tcg_splitwx_diff;
69
+ }
70
+ return rw;
71
+}
72
+
73
+void *tcg_splitwx_to_rw(const void *rx)
74
+{
75
+ /* Pass NULL pointers unchanged. */
76
+ if (rx) {
77
+ rx -= tcg_splitwx_diff;
78
+ /* Assert that we end with a pointer in the rw region. */
79
+ g_assert(in_code_gen_buffer(rx));
80
+ }
81
+ return (void *)rx;
82
+}
83
+#endif /* CONFIG_DEBUG_TCG */
84
+
85
/* compare a pointer @ptr and a tb_tc @s */
86
static int ptr_cmp_tb_tc(const void *ptr, const struct tb_tc *s)
87
{
88
diff --git a/tcg/tcg.c b/tcg/tcg.c
89
index XXXXXXX..XXXXXXX 100644
90
--- a/tcg/tcg.c
91
+++ b/tcg/tcg.c
92
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef constraint_sets[] = {
93
94
#include "tcg-target.c.inc"
95
96
-#ifdef CONFIG_DEBUG_TCG
97
-const void *tcg_splitwx_to_rx(void *rw)
98
-{
99
- /* Pass NULL pointers unchanged. */
100
- if (rw) {
101
- g_assert(in_code_gen_buffer(rw));
102
- rw += tcg_splitwx_diff;
103
- }
104
- return rw;
105
-}
106
-
107
-void *tcg_splitwx_to_rw(const void *rx)
108
-{
109
- /* Pass NULL pointers unchanged. */
110
- if (rx) {
111
- rx -= tcg_splitwx_diff;
112
- /* Assert that we end with a pointer in the rw region. */
113
- g_assert(in_code_gen_buffer(rx));
114
- }
115
- return (void *)rx;
116
-}
117
-#endif /* CONFIG_DEBUG_TCG */
118
-
119
static void alloc_tcg_plugin_context(TCGContext *s)
120
{
121
#ifdef CONFIG_PLUGIN
30
--
122
--
31
2.20.1
123
2.25.1
32
124
33
125
diff view generated by jsdifflib
1
We do not need the entire CPUArchState to compute these values.
1
Do not mess around with setting values within tcg_init_ctx.
2
Put the values into 'region' directly, which is where they
3
will live for the lifetime of the program.
2
4
3
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
4
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
6
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
5
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
8
---
8
accel/tcg/cputlb.c | 15 ++++++++-------
9
tcg/region.c | 64 ++++++++++++++++++++++------------------------------
9
1 file changed, 8 insertions(+), 7 deletions(-)
10
1 file changed, 27 insertions(+), 37 deletions(-)
10
11
11
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
12
diff --git a/tcg/region.c b/tcg/region.c
12
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
13
--- a/accel/tcg/cputlb.c
14
--- a/tcg/region.c
14
+++ b/accel/tcg/cputlb.c
15
+++ b/tcg/region.c
15
@@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
16
@@ -XXX,XX +XXX,XX @@ static size_t tree_size;
16
QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
17
17
#define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
18
bool in_code_gen_buffer(const void *p)
18
19
-static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx)
20
+static inline size_t tlb_n_entries(CPUTLBDescFast *fast)
21
{
19
{
22
- return (env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS) + 1;
20
- const TCGContext *s = &tcg_init_ctx;
23
+ return (fast->mask >> CPU_TLB_ENTRY_BITS) + 1;
21
/*
22
* Much like it is valid to have a pointer to the byte past the
23
* end of an array (so long as you don't dereference it), allow
24
* a pointer to the byte past the end of the code gen buffer.
25
*/
26
- return (size_t)(p - s->code_gen_buffer) <= s->code_gen_buffer_size;
27
+ return (size_t)(p - region.start_aligned) <= region.total_size;
24
}
28
}
25
29
26
-static inline size_t sizeof_tlb(CPUArchState *env, uintptr_t mmu_idx)
30
#ifdef CONFIG_DEBUG_TCG
27
+static inline size_t sizeof_tlb(CPUTLBDescFast *fast)
31
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
32
}
33
qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
34
35
- tcg_ctx->code_gen_buffer = buf;
36
- tcg_ctx->code_gen_buffer_size = size;
37
+ region.start_aligned = buf;
38
+ region.total_size = size;
39
return true;
40
}
41
#elif defined(_WIN32)
42
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
43
return false;
44
}
45
46
- tcg_ctx->code_gen_buffer = buf;
47
- tcg_ctx->code_gen_buffer_size = size;
48
+ region.start_aligned = buf;
49
+ region.total_size = size;
50
return true;
51
}
52
#else
53
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot,
54
/* Request large pages for the buffer. */
55
qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
56
57
- tcg_ctx->code_gen_buffer = buf;
58
- tcg_ctx->code_gen_buffer_size = size;
59
+ region.start_aligned = buf;
60
+ region.total_size = size;
61
return true;
62
}
63
64
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
65
return false;
66
}
67
/* The size of the mapping may have been adjusted. */
68
- size = tcg_ctx->code_gen_buffer_size;
69
- buf_rx = tcg_ctx->code_gen_buffer;
70
+ buf_rx = region.start_aligned;
71
+ size = region.total_size;
72
#endif
73
74
buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp);
75
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
76
#endif
77
78
close(fd);
79
- tcg_ctx->code_gen_buffer = buf_rw;
80
- tcg_ctx->code_gen_buffer_size = size;
81
+ region.start_aligned = buf_rw;
82
+ region.total_size = size;
83
tcg_splitwx_diff = buf_rx - buf_rw;
84
85
/* Request large pages for the buffer and the splitwx. */
86
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
87
return false;
88
}
89
90
- buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;
91
+ buf_rw = region.start_aligned;
92
buf_rx = 0;
93
ret = mach_vm_remap(mach_task_self(),
94
&buf_rx,
95
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
96
*/
97
void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
28
{
98
{
29
- return env_tlb(env)->f[mmu_idx].mask + (1 << CPU_TLB_ENTRY_BITS);
99
- void *buf, *aligned, *end;
30
+ return fast->mask + (1 << CPU_TLB_ENTRY_BITS);
100
- size_t total_size;
31
}
101
size_t page_size;
32
102
size_t region_size;
33
static void tlb_window_reset(CPUTLBDesc *desc, int64_t ns,
103
- size_t n_regions;
34
@@ -XXX,XX +XXX,XX @@ static void tlb_dyn_init(CPUArchState *env)
104
size_t i;
35
static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx)
105
bool ok;
36
{
106
37
CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
107
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
38
- size_t old_size = tlb_n_entries(env, mmu_idx);
108
splitwx, &error_fatal);
39
+ size_t old_size = tlb_n_entries(&env_tlb(env)->f[mmu_idx]);
109
assert(ok);
40
size_t rate;
110
41
size_t new_size = old_size;
111
- buf = tcg_init_ctx.code_gen_buffer;
42
int64_t now = get_clock_realtime();
112
- total_size = tcg_init_ctx.code_gen_buffer_size;
43
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
113
- page_size = qemu_real_host_page_size;
44
env_tlb(env)->d[mmu_idx].large_page_addr = -1;
114
- n_regions = tcg_n_regions(total_size, max_cpus);
45
env_tlb(env)->d[mmu_idx].large_page_mask = -1;
115
-
46
env_tlb(env)->d[mmu_idx].vindex = 0;
116
- /* The first region will be 'aligned - buf' bytes larger than the others */
47
- memset(env_tlb(env)->f[mmu_idx].table, -1, sizeof_tlb(env, mmu_idx));
117
- aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
48
+ memset(env_tlb(env)->f[mmu_idx].table, -1,
118
- g_assert(aligned < tcg_init_ctx.code_gen_buffer + total_size);
49
+ sizeof_tlb(&env_tlb(env)->f[mmu_idx]));
119
-
50
memset(env_tlb(env)->d[mmu_idx].vtable, -1,
120
/*
51
sizeof(env_tlb(env)->d[0].vtable));
121
* Make region_size a multiple of page_size, using aligned as the start.
52
}
122
* As a result of this we might end up with a few extra pages at the end of
53
@@ -XXX,XX +XXX,XX @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
123
* the buffer; we will assign those to the last region.
54
qemu_spin_lock(&env_tlb(env)->c.lock);
124
*/
55
for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
125
- region_size = (total_size - (aligned - buf)) / n_regions;
56
unsigned int i;
126
+ region.n = tcg_n_regions(region.total_size, max_cpus);
57
- unsigned int n = tlb_n_entries(env, mmu_idx);
127
+ page_size = qemu_real_host_page_size;
58
+ unsigned int n = tlb_n_entries(&env_tlb(env)->f[mmu_idx]);
128
+ region_size = region.total_size / region.n;
59
129
region_size = QEMU_ALIGN_DOWN(region_size, page_size);
60
for (i = 0; i < n; i++) {
130
61
tlb_reset_dirty_range_locked(&env_tlb(env)->f[mmu_idx].table[i],
131
/* A region must have at least 2 pages; one code, one guard */
132
g_assert(region_size >= 2 * page_size);
133
+ region.stride = region_size;
134
+
135
+ /* Reserve space for guard pages. */
136
+ region.size = region_size - page_size;
137
+ region.total_size -= page_size;
138
+
139
+ /*
140
+ * The first region will be smaller than the others, via the prologue,
141
+ * which has yet to be allocated. For now, the first region begins at
142
+ * the page boundary.
143
+ */
144
+ region.after_prologue = region.start_aligned;
145
146
/* init the region struct */
147
qemu_mutex_init(&region.lock);
148
- region.n = n_regions;
149
- region.size = region_size - page_size;
150
- region.stride = region_size;
151
- region.after_prologue = buf;
152
- region.start_aligned = aligned;
153
- /* page-align the end, since its last page will be a guard page */
154
- end = QEMU_ALIGN_PTR_DOWN(buf + total_size, page_size);
155
- /* account for that last guard page */
156
- end -= page_size;
157
- total_size = end - aligned;
158
- region.total_size = total_size;
159
160
/*
161
* Set guard pages in the rw buffer, as that's the one into which
62
--
162
--
63
2.20.1
163
2.25.1
64
164
65
165
diff view generated by jsdifflib
New patch
1
1
Change the interface from a boolean error indication to a
2
negative error vs a non-negative protection. For the moment
3
this is only interface change, not making use of the new data.
4
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
tcg/region.c | 63 +++++++++++++++++++++++++++-------------------------
10
1 file changed, 33 insertions(+), 30 deletions(-)
11
12
diff --git a/tcg/region.c b/tcg/region.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/tcg/region.c
15
+++ b/tcg/region.c
16
@@ -XXX,XX +XXX,XX @@ static inline void split_cross_256mb(void **obuf, size_t *osize,
17
static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
18
__attribute__((aligned(CODE_GEN_ALIGN)));
19
20
-static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
21
+static int alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
22
{
23
void *buf, *end;
24
size_t size;
25
26
if (splitwx > 0) {
27
error_setg(errp, "jit split-wx not supported");
28
- return false;
29
+ return -1;
30
}
31
32
/* page-align the beginning and end of the buffer */
33
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
34
35
region.start_aligned = buf;
36
region.total_size = size;
37
- return true;
38
+
39
+ return PROT_READ | PROT_WRITE;
40
}
41
#elif defined(_WIN32)
42
-static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
43
+static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
44
{
45
void *buf;
46
47
if (splitwx > 0) {
48
error_setg(errp, "jit split-wx not supported");
49
- return false;
50
+ return -1;
51
}
52
53
buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
54
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
55
56
region.start_aligned = buf;
57
region.total_size = size;
58
- return true;
59
+
60
+ return PAGE_READ | PAGE_WRITE | PAGE_EXEC;
61
}
62
#else
63
-static bool alloc_code_gen_buffer_anon(size_t size, int prot,
64
- int flags, Error **errp)
65
+static int alloc_code_gen_buffer_anon(size_t size, int prot,
66
+ int flags, Error **errp)
67
{
68
void *buf;
69
70
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot,
71
if (buf == MAP_FAILED) {
72
error_setg_errno(errp, errno,
73
"allocate %zu bytes for jit buffer", size);
74
- return false;
75
+ return -1;
76
}
77
78
#ifdef __mips__
79
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot,
80
81
region.start_aligned = buf;
82
region.total_size = size;
83
- return true;
84
+ return prot;
85
}
86
87
#ifndef CONFIG_TCG_INTERPRETER
88
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
89
90
#ifdef __mips__
91
/* Find space for the RX mapping, vs the 256MiB regions. */
92
- if (!alloc_code_gen_buffer_anon(size, PROT_NONE,
93
- MAP_PRIVATE | MAP_ANONYMOUS |
94
- MAP_NORESERVE, errp)) {
95
+ if (alloc_code_gen_buffer_anon(size, PROT_NONE,
96
+ MAP_PRIVATE | MAP_ANONYMOUS |
97
+ MAP_NORESERVE, errp) < 0) {
98
return false;
99
}
100
/* The size of the mapping may have been adjusted. */
101
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
102
/* Request large pages for the buffer and the splitwx. */
103
qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
104
qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
105
- return true;
106
+ return PROT_READ | PROT_WRITE;
107
108
fail_rx:
109
error_setg_errno(errp, errno, "failed to map shared memory for execute");
110
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
111
if (fd >= 0) {
112
close(fd);
113
}
114
- return false;
115
+ return -1;
116
}
117
#endif /* CONFIG_POSIX */
118
119
@@ -XXX,XX +XXX,XX @@ extern kern_return_t mach_vm_remap(vm_map_t target_task,
120
vm_prot_t *max_protection,
121
vm_inherit_t inheritance);
122
123
-static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
124
+static int alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
125
{
126
kern_return_t ret;
127
mach_vm_address_t buf_rw, buf_rx;
128
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
129
/* Map the read-write portion via normal anon memory. */
130
if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE,
131
MAP_PRIVATE | MAP_ANONYMOUS, errp)) {
132
- return false;
133
+ return -1;
134
}
135
136
buf_rw = region.start_aligned;
137
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
138
/* TODO: Convert "ret" to a human readable error message. */
139
error_setg(errp, "vm_remap for jit splitwx failed");
140
munmap((void *)buf_rw, size);
141
- return false;
142
+ return -1;
143
}
144
145
if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {
146
error_setg_errno(errp, errno, "mprotect for jit splitwx");
147
munmap((void *)buf_rx, size);
148
munmap((void *)buf_rw, size);
149
- return false;
150
+ return -1;
151
}
152
153
tcg_splitwx_diff = buf_rx - buf_rw;
154
- return true;
155
+ return PROT_READ | PROT_WRITE;
156
}
157
#endif /* CONFIG_DARWIN */
158
#endif /* CONFIG_TCG_INTERPRETER */
159
160
-static bool alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
161
+static int alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
162
{
163
#ifndef CONFIG_TCG_INTERPRETER
164
# ifdef CONFIG_DARWIN
165
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
166
# endif
167
#endif
168
error_setg(errp, "jit split-wx not supported");
169
- return false;
170
+ return -1;
171
}
172
173
-static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
174
+static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
175
{
176
ERRP_GUARD();
177
int prot, flags;
178
179
if (splitwx) {
180
- if (alloc_code_gen_buffer_splitwx(size, errp)) {
181
- return true;
182
+ prot = alloc_code_gen_buffer_splitwx(size, errp);
183
+ if (prot >= 0) {
184
+ return prot;
185
}
186
/*
187
* If splitwx force-on (1), fail;
188
* if splitwx default-on (-1), fall through to splitwx off.
189
*/
190
if (splitwx > 0) {
191
- return false;
192
+ return -1;
193
}
194
error_free_or_abort(errp);
195
}
196
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
197
size_t page_size;
198
size_t region_size;
199
size_t i;
200
- bool ok;
201
+ int have_prot;
202
203
- ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
204
- splitwx, &error_fatal);
205
- assert(ok);
206
+ have_prot = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
207
+ splitwx, &error_fatal);
208
+ assert(have_prot >= 0);
209
210
/*
211
* Make region_size a multiple of page_size, using aligned as the start.
212
--
213
2.25.1
214
215
diff view generated by jsdifflib
1
No functional change, but the smaller expressions make
1
Move the call out of the N versions of alloc_code_gen_buffer
2
the code easier to read.
2
and into tcg_region_init.
3
3
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
5
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
7
---
9
accel/tcg/cputlb.c | 35 +++++++++++++++++------------------
8
tcg/region.c | 14 +++++++-------
10
1 file changed, 17 insertions(+), 18 deletions(-)
9
1 file changed, 7 insertions(+), 7 deletions(-)
11
10
12
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
11
diff --git a/tcg/region.c b/tcg/region.c
13
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
14
--- a/accel/tcg/cputlb.c
13
--- a/tcg/region.c
15
+++ b/accel/tcg/cputlb.c
14
+++ b/tcg/region.c
16
@@ -XXX,XX +XXX,XX @@ static void tlb_dyn_init(CPUArchState *env)
15
@@ -XXX,XX +XXX,XX @@ static int alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
17
16
error_setg_errno(errp, errno, "mprotect of jit buffer");
18
/**
17
return false;
19
* tlb_mmu_resize_locked() - perform TLB resize bookkeeping; resize if necessary
20
- * @env: CPU that owns the TLB
21
- * @mmu_idx: MMU index of the TLB
22
+ * @desc: The CPUTLBDesc portion of the TLB
23
+ * @fast: The CPUTLBDescFast portion of the same TLB
24
*
25
* Called with tlb_lock_held.
26
*
27
@@ -XXX,XX +XXX,XX @@ static void tlb_dyn_init(CPUArchState *env)
28
* high), since otherwise we are likely to have a significant amount of
29
* conflict misses.
30
*/
31
-static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx)
32
+static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
33
{
34
- CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
35
- size_t old_size = tlb_n_entries(&env_tlb(env)->f[mmu_idx]);
36
+ size_t old_size = tlb_n_entries(fast);
37
size_t rate;
38
size_t new_size = old_size;
39
int64_t now = get_clock_realtime();
40
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx)
41
return;
42
}
18
}
43
19
- qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
44
- g_free(env_tlb(env)->f[mmu_idx].table);
20
45
- g_free(env_tlb(env)->d[mmu_idx].iotlb);
21
region.start_aligned = buf;
46
+ g_free(fast->table);
22
region.total_size = size;
47
+ g_free(desc->iotlb);
23
@@ -XXX,XX +XXX,XX @@ static int alloc_code_gen_buffer_anon(size_t size, int prot,
48
24
}
49
tlb_window_reset(desc, now, 0);
25
#endif
50
/* desc->n_used_entries is cleared by the caller */
26
51
- env_tlb(env)->f[mmu_idx].mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
27
- /* Request large pages for the buffer. */
52
- env_tlb(env)->f[mmu_idx].table = g_try_new(CPUTLBEntry, new_size);
28
- qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
53
- env_tlb(env)->d[mmu_idx].iotlb = g_try_new(CPUIOTLBEntry, new_size);
29
-
54
+ fast->mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
30
region.start_aligned = buf;
55
+ fast->table = g_try_new(CPUTLBEntry, new_size);
31
region.total_size = size;
56
+ desc->iotlb = g_try_new(CPUIOTLBEntry, new_size);
32
return prot;
33
@@ -XXX,XX +XXX,XX @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t size, Error **errp)
34
region.total_size = size;
35
tcg_splitwx_diff = buf_rx - buf_rw;
36
37
- /* Request large pages for the buffer and the splitwx. */
38
- qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
39
- qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
40
return PROT_READ | PROT_WRITE;
41
42
fail_rx:
43
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
44
splitwx, &error_fatal);
45
assert(have_prot >= 0);
46
47
+ /* Request large pages for the buffer and the splitwx. */
48
+ qemu_madvise(region.start_aligned, region.total_size, QEMU_MADV_HUGEPAGE);
49
+ if (tcg_splitwx_diff) {
50
+ qemu_madvise(region.start_aligned + tcg_splitwx_diff,
51
+ region.total_size, QEMU_MADV_HUGEPAGE);
52
+ }
57
+
53
+
58
/*
54
/*
59
* If the allocations fail, try smaller sizes. We just freed some
55
* Make region_size a multiple of page_size, using aligned as the start.
60
* memory, so going back to half of new_size has a good chance of working.
56
* As a result of this we might end up with a few extra pages at the end of
61
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx)
62
* allocations to fail though, so we progressively reduce the allocation
63
* size, aborting if we cannot even allocate the smallest TLB we support.
64
*/
65
- while (env_tlb(env)->f[mmu_idx].table == NULL ||
66
- env_tlb(env)->d[mmu_idx].iotlb == NULL) {
67
+ while (fast->table == NULL || desc->iotlb == NULL) {
68
if (new_size == (1 << CPU_TLB_DYN_MIN_BITS)) {
69
error_report("%s: %s", __func__, strerror(errno));
70
abort();
71
}
72
new_size = MAX(new_size >> 1, 1 << CPU_TLB_DYN_MIN_BITS);
73
- env_tlb(env)->f[mmu_idx].mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
74
+ fast->mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
75
76
- g_free(env_tlb(env)->f[mmu_idx].table);
77
- g_free(env_tlb(env)->d[mmu_idx].iotlb);
78
- env_tlb(env)->f[mmu_idx].table = g_try_new(CPUTLBEntry, new_size);
79
- env_tlb(env)->d[mmu_idx].iotlb = g_try_new(CPUIOTLBEntry, new_size);
80
+ g_free(fast->table);
81
+ g_free(desc->iotlb);
82
+ fast->table = g_try_new(CPUTLBEntry, new_size);
83
+ desc->iotlb = g_try_new(CPUIOTLBEntry, new_size);
84
}
85
}
86
87
static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
88
{
89
- tlb_mmu_resize_locked(env, mmu_idx);
90
+ tlb_mmu_resize_locked(&env_tlb(env)->d[mmu_idx], &env_tlb(env)->f[mmu_idx]);
91
env_tlb(env)->d[mmu_idx].n_used_entries = 0;
92
env_tlb(env)->d[mmu_idx].large_page_addr = -1;
93
env_tlb(env)->d[mmu_idx].large_page_mask = -1;
94
--
57
--
95
2.20.1
58
2.25.1
96
59
97
60
diff view generated by jsdifflib
1
There are no users of this function outside cputlb.c,
1
For --enable-tcg-interpreter on Windows, we will need this.
2
and its interface will change in the next patch.
3
2
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
3
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
4
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
5
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
7
---
9
include/exec/cpu_ldst.h | 5 -----
8
include/qemu/osdep.h | 1 +
10
accel/tcg/cputlb.c | 5 +++++
9
util/osdep.c | 9 +++++++++
11
2 files changed, 5 insertions(+), 5 deletions(-)
10
2 files changed, 10 insertions(+)
12
11
13
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
12
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
14
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
15
--- a/include/exec/cpu_ldst.h
14
--- a/include/qemu/osdep.h
16
+++ b/include/exec/cpu_ldst.h
15
+++ b/include/qemu/osdep.h
17
@@ -XXX,XX +XXX,XX @@ static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
16
@@ -XXX,XX +XXX,XX @@ void sigaction_invoke(struct sigaction *action,
18
return (addr >> TARGET_PAGE_BITS) & size_mask;
17
#endif
18
19
int qemu_madvise(void *addr, size_t len, int advice);
20
+int qemu_mprotect_rw(void *addr, size_t size);
21
int qemu_mprotect_rwx(void *addr, size_t size);
22
int qemu_mprotect_none(void *addr, size_t size);
23
24
diff --git a/util/osdep.c b/util/osdep.c
25
index XXXXXXX..XXXXXXX 100644
26
--- a/util/osdep.c
27
+++ b/util/osdep.c
28
@@ -XXX,XX +XXX,XX @@ static int qemu_mprotect__osdep(void *addr, size_t size, int prot)
29
#endif
19
}
30
}
20
31
21
-static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx)
32
+int qemu_mprotect_rw(void *addr, size_t size)
22
-{
23
- return (env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS) + 1;
24
-}
25
-
26
/* Find the TLB entry corresponding to the mmu_idx + address pair. */
27
static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
28
target_ulong addr)
29
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
30
index XXXXXXX..XXXXXXX 100644
31
--- a/accel/tcg/cputlb.c
32
+++ b/accel/tcg/cputlb.c
33
@@ -XXX,XX +XXX,XX @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
34
QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
35
#define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
36
37
+static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx)
38
+{
33
+{
39
+ return (env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS) + 1;
34
+#ifdef _WIN32
35
+ return qemu_mprotect__osdep(addr, size, PAGE_READWRITE);
36
+#else
37
+ return qemu_mprotect__osdep(addr, size, PROT_READ | PROT_WRITE);
38
+#endif
40
+}
39
+}
41
+
40
+
42
static inline size_t sizeof_tlb(CPUArchState *env, uintptr_t mmu_idx)
41
int qemu_mprotect_rwx(void *addr, size_t size)
43
{
42
{
44
return env_tlb(env)->f[mmu_idx].mask + (1 << CPU_TLB_ENTRY_BITS);
43
#ifdef _WIN32
45
--
44
--
46
2.20.1
45
2.25.1
47
46
48
47
diff view generated by jsdifflib
1
No functional change, but the smaller expressions make
1
If qemu_get_host_physmem returns an odd number of pages,
2
the code easier to read.
2
then physmem / 8 will not be a multiple of the page size.
3
4
The following was observed on a gitlab runner:
5
6
ERROR qtest-arm/boot-serial-test - Bail out!
7
ERROR:../util/osdep.c:80:qemu_mprotect__osdep: \
8
assertion failed: (!(size & ~qemu_real_host_page_mask))
3
9
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
10
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
11
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
12
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
13
---
9
accel/tcg/cputlb.c | 19 ++++++++++---------
14
tcg/region.c | 47 +++++++++++++++++++++--------------------------
10
1 file changed, 10 insertions(+), 9 deletions(-)
15
1 file changed, 21 insertions(+), 26 deletions(-)
11
16
12
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
17
diff --git a/tcg/region.c b/tcg/region.c
13
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
14
--- a/accel/tcg/cputlb.c
19
--- a/tcg/region.c
15
+++ b/accel/tcg/cputlb.c
20
+++ b/tcg/region.c
16
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
21
@@ -XXX,XX +XXX,XX @@ static size_t tcg_n_regions(size_t tb_size, unsigned max_cpus)
17
22
(DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
18
static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
23
? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE)
24
25
-static size_t size_code_gen_buffer(size_t tb_size)
26
-{
27
- /* Size the buffer. */
28
- if (tb_size == 0) {
29
- size_t phys_mem = qemu_get_host_physmem();
30
- if (phys_mem == 0) {
31
- tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
32
- } else {
33
- tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8);
34
- }
35
- }
36
- if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
37
- tb_size = MIN_CODE_GEN_BUFFER_SIZE;
38
- }
39
- if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
40
- tb_size = MAX_CODE_GEN_BUFFER_SIZE;
41
- }
42
- return tb_size;
43
-}
44
-
45
#ifdef __mips__
46
/*
47
* In order to use J and JAL within the code_gen_buffer, we require
48
@@ -XXX,XX +XXX,XX @@ static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
49
*/
50
void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
19
{
51
{
20
- tlb_mmu_resize_locked(&env_tlb(env)->d[mmu_idx], &env_tlb(env)->f[mmu_idx]);
52
- size_t page_size;
21
- env_tlb(env)->d[mmu_idx].n_used_entries = 0;
53
+ const size_t page_size = qemu_real_host_page_size;
22
- env_tlb(env)->d[mmu_idx].large_page_addr = -1;
54
size_t region_size;
23
- env_tlb(env)->d[mmu_idx].large_page_mask = -1;
55
size_t i;
24
- env_tlb(env)->d[mmu_idx].vindex = 0;
56
int have_prot;
25
- memset(env_tlb(env)->f[mmu_idx].table, -1,
57
26
- sizeof_tlb(&env_tlb(env)->f[mmu_idx]));
58
- have_prot = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
27
- memset(env_tlb(env)->d[mmu_idx].vtable, -1,
59
- splitwx, &error_fatal);
28
- sizeof(env_tlb(env)->d[0].vtable));
60
+ /* Size the buffer. */
29
+ CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
61
+ if (tb_size == 0) {
30
+ CPUTLBDescFast *fast = &env_tlb(env)->f[mmu_idx];
62
+ size_t phys_mem = qemu_get_host_physmem();
63
+ if (phys_mem == 0) {
64
+ tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
65
+ } else {
66
+ tb_size = QEMU_ALIGN_DOWN(phys_mem / 8, page_size);
67
+ tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, tb_size);
68
+ }
69
+ }
70
+ if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
71
+ tb_size = MIN_CODE_GEN_BUFFER_SIZE;
72
+ }
73
+ if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
74
+ tb_size = MAX_CODE_GEN_BUFFER_SIZE;
75
+ }
31
+
76
+
32
+ tlb_mmu_resize_locked(desc, fast);
77
+ have_prot = alloc_code_gen_buffer(tb_size, splitwx, &error_fatal);
33
+ desc->n_used_entries = 0;
78
assert(have_prot >= 0);
34
+ desc->large_page_addr = -1;
79
35
+ desc->large_page_mask = -1;
80
/* Request large pages for the buffer and the splitwx. */
36
+ desc->vindex = 0;
81
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
37
+ memset(fast->table, -1, sizeof_tlb(fast));
82
* As a result of this we might end up with a few extra pages at the end of
38
+ memset(desc->vtable, -1, sizeof(desc->vtable));
83
* the buffer; we will assign those to the last region.
39
}
84
*/
40
85
- region.n = tcg_n_regions(region.total_size, max_cpus);
41
static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)
86
- page_size = qemu_real_host_page_size;
87
- region_size = region.total_size / region.n;
88
+ region.n = tcg_n_regions(tb_size, max_cpus);
89
+ region_size = tb_size / region.n;
90
region_size = QEMU_ALIGN_DOWN(region_size, page_size);
91
92
/* A region must have at least 2 pages; one code, one guard */
42
--
93
--
43
2.20.1
94
2.25.1
44
95
45
96
diff view generated by jsdifflib
1
Do not call get_clock_realtime() in tlb_mmu_resize_locked,
1
Do not handle protections on a case-by-case basis in the
2
but hoist outside of any loop over a set of tlbs. This is
2
various alloc_code_gen_buffer instances; do it within a
3
only two (indirect) callers, tlb_flush_by_mmuidx_async_work
3
single loop in tcg_region_init.
4
and tlb_flush_page_locked, so not onerous.
5
4
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
6
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
10
---
8
---
11
accel/tcg/cputlb.c | 14 ++++++++------
9
tcg/region.c | 45 +++++++++++++++++++++++++++++++--------------
12
1 file changed, 8 insertions(+), 6 deletions(-)
10
1 file changed, 31 insertions(+), 14 deletions(-)
13
11
14
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
12
diff --git a/tcg/region.c b/tcg/region.c
15
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
16
--- a/accel/tcg/cputlb.c
14
--- a/tcg/region.c
17
+++ b/accel/tcg/cputlb.c
15
+++ b/tcg/region.c
18
@@ -XXX,XX +XXX,XX @@ static void tlb_window_reset(CPUTLBDesc *desc, int64_t ns,
16
@@ -XXX,XX +XXX,XX @@ static int alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
19
* high), since otherwise we are likely to have a significant amount of
17
}
20
* conflict misses.
18
#endif
21
*/
19
22
-static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
20
- if (qemu_mprotect_rwx(buf, size)) {
23
+static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast,
21
- error_setg_errno(errp, errno, "mprotect of jit buffer");
24
+ int64_t now)
22
- return false;
23
- }
24
-
25
region.start_aligned = buf;
26
region.total_size = size;
27
28
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
25
{
29
{
26
size_t old_size = tlb_n_entries(fast);
30
const size_t page_size = qemu_real_host_page_size;
27
size_t rate;
31
size_t region_size;
28
size_t new_size = old_size;
32
- size_t i;
29
- int64_t now = get_clock_realtime();
33
- int have_prot;
30
int64_t window_len_ms = 100;
34
+ int have_prot, need_prot;
31
int64_t window_len_ns = window_len_ms * 1000 * 1000;
35
32
bool window_expired = now > desc->window_begin_ns + window_len_ns;
36
/* Size the buffer. */
33
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_flush_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
37
if (tb_size == 0) {
34
memset(desc->vtable, -1, sizeof(desc->vtable));
38
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
35
}
39
* Set guard pages in the rw buffer, as that's the one into which
36
40
* buffer overruns could occur. Do not set guard pages in the rx
37
-static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
41
* buffer -- let that one use hugepages throughout.
38
+static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx,
42
+ * Work with the page protections set up with the initial mapping.
39
+ int64_t now)
43
*/
40
{
44
- for (i = 0; i < region.n; i++) {
41
CPUTLBDesc *desc = &env_tlb(env)->d[mmu_idx];
45
+ need_prot = PAGE_READ | PAGE_WRITE;
42
CPUTLBDescFast *fast = &env_tlb(env)->f[mmu_idx];
46
+#ifndef CONFIG_TCG_INTERPRETER
43
47
+ if (tcg_splitwx_diff == 0) {
44
- tlb_mmu_resize_locked(desc, fast);
48
+ need_prot |= PAGE_EXEC;
45
+ tlb_mmu_resize_locked(desc, fast, now);
49
+ }
46
tlb_mmu_flush_locked(desc, fast);
50
+#endif
47
}
51
+ for (size_t i = 0, n = region.n; i < n; i++) {
48
52
void *start, *end;
49
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
53
50
CPUArchState *env = cpu->env_ptr;
54
tcg_region_bounds(i, &start, &end);
51
uint16_t asked = data.host_int;
55
+ if (have_prot != need_prot) {
52
uint16_t all_dirty, work, to_clean;
56
+ int rc;
53
+ int64_t now = get_clock_realtime();
57
54
58
- /*
55
assert_cpu_is_self(cpu);
59
- * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
56
60
- * rejects a permission change from RWX -> NONE. Guard pages are
57
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
61
- * nice for bug detection but are not essential; ignore any failure.
58
62
- */
59
for (work = to_clean; work != 0; work &= work - 1) {
63
- (void)qemu_mprotect_none(end, page_size);
60
int mmu_idx = ctz32(work);
64
+ if (need_prot == (PAGE_READ | PAGE_WRITE | PAGE_EXEC)) {
61
- tlb_flush_one_mmuidx_locked(env, mmu_idx);
65
+ rc = qemu_mprotect_rwx(start, end - start);
62
+ tlb_flush_one_mmuidx_locked(env, mmu_idx, now);
66
+ } else if (need_prot == (PAGE_READ | PAGE_WRITE)) {
67
+ rc = qemu_mprotect_rw(start, end - start);
68
+ } else {
69
+ g_assert_not_reached();
70
+ }
71
+ if (rc) {
72
+ error_setg_errno(&error_fatal, errno,
73
+ "mprotect of jit buffer");
74
+ }
75
+ }
76
+ if (have_prot != 0) {
77
+ /*
78
+ * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
79
+ * rejects a permission change from RWX -> NONE. Guard pages are
80
+ * nice for bug detection but are not essential; ignore any failure.
81
+ */
82
+ (void)qemu_mprotect_none(end, page_size);
83
+ }
63
}
84
}
64
85
65
qemu_spin_unlock(&env_tlb(env)->c.lock);
86
tcg_region_trees_init();
66
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_locked(CPUArchState *env, int midx,
67
tlb_debug("forcing full flush midx %d ("
68
TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
69
midx, lp_addr, lp_mask);
70
- tlb_flush_one_mmuidx_locked(env, midx);
71
+ tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
72
} else {
73
if (tlb_flush_entry_locked(tlb_entry(env, midx, page), page)) {
74
tlb_n_used_entries_dec(env, midx);
75
--
87
--
76
2.20.1
88
2.25.1
77
89
78
90
diff view generated by jsdifflib
1
By choosing "tcg:kvm" when kvm is not enabled, we generate
1
There's a change in mprotect() behaviour [1] in the latest macOS
2
an incorrect warning: "invalid accelerator kvm".
2
on M1 and it's not yet clear if it's going to be fixed by Apple.
3
3
4
At the same time, use g_str_has_suffix rather than open-coding
4
In this case, instead of changing permissions of N guard pages,
5
the same operation.
5
we change permissions of N rwx regions. The same number of
6
syscalls are required either way.
6
7
7
Presumably the inverse is also true with --disable-tcg.
8
[1] https://gist.github.com/hikalium/75ae822466ee4da13cbbe486498a191f
8
9
9
Fixes: 28a0961757fc
10
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
10
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
11
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
12
Reviewed by: Aleksandar Markovic <amarkovic@wavecomp.com>
13
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
11
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
14
---
12
---
15
vl.c | 21 +++++++++++++--------
13
tcg/region.c | 19 +++++++++----------
16
1 file changed, 13 insertions(+), 8 deletions(-)
14
1 file changed, 9 insertions(+), 10 deletions(-)
17
15
18
diff --git a/vl.c b/vl.c
16
diff --git a/tcg/region.c b/tcg/region.c
19
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
20
--- a/vl.c
18
--- a/tcg/region.c
21
+++ b/vl.c
19
+++ b/tcg/region.c
22
@@ -XXX,XX +XXX,XX @@ static void configure_accelerators(const char *progname)
20
@@ -XXX,XX +XXX,XX @@ static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
23
21
error_free_or_abort(errp);
24
if (accel == NULL) {
22
}
25
/* Select the default accelerator */
23
26
- if (!accel_find("tcg") && !accel_find("kvm")) {
24
- prot = PROT_READ | PROT_WRITE | PROT_EXEC;
27
- error_report("No accelerator selected and"
25
+ /*
28
- " no default accelerator available");
26
+ * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
29
- exit(1);
27
+ * rejects a permission change from RWX -> NONE when reserving the
30
- } else {
28
+ * guard pages later. We can go the other way with the same number
31
- int pnlen = strlen(progname);
29
+ * of syscalls, so always begin with PROT_NONE.
32
- if (pnlen >= 3 && g_str_equal(&progname[pnlen - 3], "kvm")) {
30
+ */
33
+ bool have_tcg = accel_find("tcg");
31
+ prot = PROT_NONE;
34
+ bool have_kvm = accel_find("kvm");
32
flags = MAP_PRIVATE | MAP_ANONYMOUS;
35
+
33
-#ifdef CONFIG_TCG_INTERPRETER
36
+ if (have_tcg && have_kvm) {
34
- /* The tcg interpreter does not need execute permission. */
37
+ if (g_str_has_suffix(progname, "kvm")) {
35
- prot = PROT_READ | PROT_WRITE;
38
/* If the program name ends with "kvm", we prefer KVM */
36
-#elif defined(CONFIG_DARWIN)
39
accel = "kvm:tcg";
37
+#ifdef CONFIG_DARWIN
40
} else {
38
/* Applicable to both iOS and macOS (Apple Silicon). */
41
accel = "tcg:kvm";
39
if (!splitwx) {
42
}
40
flags |= MAP_JIT;
43
+ } else if (have_kvm) {
41
@@ -XXX,XX +XXX,XX @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
44
+ accel = "kvm";
45
+ } else if (have_tcg) {
46
+ accel = "tcg";
47
+ } else {
48
+ error_report("No accelerator selected and"
49
+ " no default accelerator available");
50
+ exit(1);
51
}
42
}
52
}
43
}
53
-
44
if (have_prot != 0) {
54
accel_list = g_strsplit(accel, ":", 0);
45
- /*
55
46
- * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
56
for (tmp = accel_list; *tmp; tmp++) {
47
- * rejects a permission change from RWX -> NONE. Guard pages are
48
- * nice for bug detection but are not essential; ignore any failure.
49
- */
50
+ /* Guard pages are nice for bug detection but are not essential. */
51
(void)qemu_mprotect_none(end, page_size);
52
}
53
}
57
--
54
--
58
2.20.1
55
2.25.1
59
56
60
57
diff view generated by jsdifflib
1
There's little point in leaving these data structures half initialized,
1
These variables belong to the jit side, not the user side.
2
and relying on a flush to be done during reset.
2
3
Since tcg_init_ctx is no longer used outside of tcg/, move
4
the declaration to tcg-internal.h.
3
5
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
8
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
---
10
---
8
accel/tcg/cputlb.c | 5 +++--
11
include/tcg/tcg.h | 1 -
9
1 file changed, 3 insertions(+), 2 deletions(-)
12
tcg/tcg-internal.h | 1 +
13
accel/tcg/translate-all.c | 3 ---
14
tcg/tcg.c | 3 +++
15
4 files changed, 4 insertions(+), 4 deletions(-)
10
16
11
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
17
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
12
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
13
--- a/accel/tcg/cputlb.c
19
--- a/include/tcg/tcg.h
14
+++ b/accel/tcg/cputlb.c
20
+++ b/include/tcg/tcg.h
15
@@ -XXX,XX +XXX,XX @@ static void tlb_mmu_init(CPUTLBDesc *desc, CPUTLBDescFast *fast, int64_t now)
21
@@ -XXX,XX +XXX,XX @@ static inline bool temp_readonly(TCGTemp *ts)
16
fast->mask = (n_entries - 1) << CPU_TLB_ENTRY_BITS;
22
return ts->kind >= TEMP_FIXED;
17
fast->table = g_new(CPUTLBEntry, n_entries);
18
desc->iotlb = g_new(CPUIOTLBEntry, n_entries);
19
+ tlb_mmu_flush_locked(desc, fast);
20
}
23
}
21
24
22
static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)
25
-extern TCGContext tcg_init_ctx;
23
@@ -XXX,XX +XXX,XX @@ void tlb_init(CPUState *cpu)
26
extern __thread TCGContext *tcg_ctx;
24
27
extern const void *tcg_code_gen_epilogue;
25
qemu_spin_init(&env_tlb(env)->c.lock);
28
extern uintptr_t tcg_splitwx_diff;
26
29
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
27
- /* Ensure that cpu_reset performs a full flush. */
30
index XXXXXXX..XXXXXXX 100644
28
- env_tlb(env)->c.dirty = ALL_MMUIDX_BITS;
31
--- a/tcg/tcg-internal.h
29
+ /* All tlbs are initialized flushed. */
32
+++ b/tcg/tcg-internal.h
30
+ env_tlb(env)->c.dirty = 0;
33
@@ -XXX,XX +XXX,XX @@
31
34
32
for (i = 0; i < NB_MMU_MODES; i++) {
35
#define TCG_HIGHWATER 1024
33
tlb_mmu_init(&env_tlb(env)->d[i], &env_tlb(env)->f[i], now);
36
37
+extern TCGContext tcg_init_ctx;
38
extern TCGContext **tcg_ctxs;
39
extern unsigned int tcg_cur_ctxs;
40
extern unsigned int tcg_max_ctxs;
41
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
42
index XXXXXXX..XXXXXXX 100644
43
--- a/accel/tcg/translate-all.c
44
+++ b/accel/tcg/translate-all.c
45
@@ -XXX,XX +XXX,XX @@ static int v_l2_levels;
46
47
static void *l1_map[V_L1_MAX_SIZE];
48
49
-/* code generation context */
50
-TCGContext tcg_init_ctx;
51
-__thread TCGContext *tcg_ctx;
52
TBContext tb_ctx;
53
54
static void page_table_config_init(void)
55
diff --git a/tcg/tcg.c b/tcg/tcg.c
56
index XXXXXXX..XXXXXXX 100644
57
--- a/tcg/tcg.c
58
+++ b/tcg/tcg.c
59
@@ -XXX,XX +XXX,XX @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct);
60
static int tcg_out_ldst_finalize(TCGContext *s);
61
#endif
62
63
+TCGContext tcg_init_ctx;
64
+__thread TCGContext *tcg_ctx;
65
+
66
TCGContext **tcg_ctxs;
67
unsigned int tcg_cur_ctxs;
68
unsigned int tcg_max_ctxs;
34
--
69
--
35
2.20.1
70
2.25.1
36
71
37
72
diff view generated by jsdifflib
1
In target/arm we will shortly have "too many" mmu_idx.
1
Introduce a function to remove everything emitted
2
The current minimum barrier is caused by the way in which
2
since a given point.
3
tlb_flush_page_by_mmuidx is coded.
4
5
We can remove this limitation by allocating memory for
6
consumption by the worker. Let us assume that this is
7
the unlikely case, as will be the case for the majority
8
of targets which have so far satisfied the BUILD_BUG_ON,
9
and only allocate memory when necessary.
10
3
11
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
13
---
6
---
14
accel/tcg/cputlb.c | 167 +++++++++++++++++++++++++++++++++++----------
7
include/tcg/tcg.h | 10 ++++++++++
15
1 file changed, 132 insertions(+), 35 deletions(-)
8
tcg/tcg.c | 13 +++++++++++++
9
2 files changed, 23 insertions(+)
16
10
17
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
11
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
18
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
19
--- a/accel/tcg/cputlb.c
13
--- a/include/tcg/tcg.h
20
+++ b/accel/tcg/cputlb.c
14
+++ b/include/tcg/tcg.h
21
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_locked(CPUArchState *env, int midx,
15
@@ -XXX,XX +XXX,XX @@ void tcg_op_remove(TCGContext *s, TCGOp *op);
22
}
16
TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc);
17
TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
18
19
+/**
20
+ * tcg_remove_ops_after:
21
+ * @op: target operation
22
+ *
23
+ * Discard any opcodes emitted since @op. Expected usage is to save
24
+ * a starting point with tcg_last_op(), speculatively emit opcodes,
25
+ * then decide whether or not to keep those opcodes after the fact.
26
+ */
27
+void tcg_remove_ops_after(TCGOp *op);
28
+
29
void tcg_optimize(TCGContext *s);
30
31
/* Allocate a new temporary and initialize it with a constant. */
32
diff --git a/tcg/tcg.c b/tcg/tcg.c
33
index XXXXXXX..XXXXXXX 100644
34
--- a/tcg/tcg.c
35
+++ b/tcg/tcg.c
36
@@ -XXX,XX +XXX,XX @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
37
#endif
23
}
38
}
24
39
25
-/* As we are going to hijack the bottom bits of the page address for a
40
+void tcg_remove_ops_after(TCGOp *op)
26
- * mmuidx bit mask we need to fail to build if we can't do that
27
+/**
28
+ * tlb_flush_page_by_mmuidx_async_0:
29
+ * @cpu: cpu on which to flush
30
+ * @addr: page of virtual address to flush
31
+ * @idxmap: set of mmu_idx to flush
32
+ *
33
+ * Helper for tlb_flush_page_by_mmuidx and friends, flush one page
34
+ * at @addr from the tlbs indicated by @idxmap from @cpu.
35
*/
36
-QEMU_BUILD_BUG_ON(NB_MMU_MODES > TARGET_PAGE_BITS_MIN);
37
-
38
-static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
39
- run_on_cpu_data data)
40
+static void tlb_flush_page_by_mmuidx_async_0(CPUState *cpu,
41
+ target_ulong addr,
42
+ uint16_t idxmap)
43
{
44
CPUArchState *env = cpu->env_ptr;
45
- target_ulong addr_and_mmuidx = (target_ulong) data.target_ptr;
46
- target_ulong addr = addr_and_mmuidx & TARGET_PAGE_MASK;
47
- unsigned long mmu_idx_bitmap = addr_and_mmuidx & ALL_MMUIDX_BITS;
48
int mmu_idx;
49
50
assert_cpu_is_self(cpu);
51
52
- tlb_debug("page addr:" TARGET_FMT_lx " mmu_map:0x%lx\n",
53
- addr, mmu_idx_bitmap);
54
+ tlb_debug("page addr:" TARGET_FMT_lx " mmu_map:0x%x\n", addr, idxmap);
55
56
qemu_spin_lock(&env_tlb(env)->c.lock);
57
for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
58
- if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
59
+ if ((idxmap >> mmu_idx) & 1) {
60
tlb_flush_page_locked(env, mmu_idx, addr);
61
}
62
}
63
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
64
tb_flush_jmp_cache(cpu, addr);
65
}
66
67
+/**
68
+ * tlb_flush_page_by_mmuidx_async_1:
69
+ * @cpu: cpu on which to flush
70
+ * @data: encoded addr + idxmap
71
+ *
72
+ * Helper for tlb_flush_page_by_mmuidx and friends, called through
73
+ * async_run_on_cpu. The idxmap parameter is encoded in the page
74
+ * offset of the target_ptr field. This limits the set of mmu_idx
75
+ * that can be passed via this method.
76
+ */
77
+static void tlb_flush_page_by_mmuidx_async_1(CPUState *cpu,
78
+ run_on_cpu_data data)
79
+{
41
+{
80
+ target_ulong addr_and_idxmap = (target_ulong) data.target_ptr;
42
+ TCGContext *s = tcg_ctx;
81
+ target_ulong addr = addr_and_idxmap & TARGET_PAGE_MASK;
82
+ uint16_t idxmap = addr_and_idxmap & ~TARGET_PAGE_MASK;
83
+
43
+
84
+ tlb_flush_page_by_mmuidx_async_0(cpu, addr, idxmap);
44
+ while (true) {
45
+ TCGOp *last = tcg_last_op();
46
+ if (last == op) {
47
+ return;
48
+ }
49
+ tcg_op_remove(s, last);
50
+ }
85
+}
51
+}
86
+
52
+
87
+typedef struct {
53
static TCGOp *tcg_op_alloc(TCGOpcode opc)
88
+ target_ulong addr;
89
+ uint16_t idxmap;
90
+} TLBFlushPageByMMUIdxData;
91
+
92
+/**
93
+ * tlb_flush_page_by_mmuidx_async_2:
94
+ * @cpu: cpu on which to flush
95
+ * @data: allocated addr + idxmap
96
+ *
97
+ * Helper for tlb_flush_page_by_mmuidx and friends, called through
98
+ * async_run_on_cpu. The addr+idxmap parameters are stored in a
99
+ * TLBFlushPageByMMUIdxData structure that has been allocated
100
+ * specifically for this helper. Free the structure when done.
101
+ */
102
+static void tlb_flush_page_by_mmuidx_async_2(CPUState *cpu,
103
+ run_on_cpu_data data)
104
+{
105
+ TLBFlushPageByMMUIdxData *d = data.host_ptr;
106
+
107
+ tlb_flush_page_by_mmuidx_async_0(cpu, d->addr, d->idxmap);
108
+ g_free(d);
109
+}
110
+
111
void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap)
112
{
54
{
113
- target_ulong addr_and_mmu_idx;
55
TCGContext *s = tcg_ctx;
114
-
115
tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%" PRIx16 "\n", addr, idxmap);
116
117
/* This should already be page aligned */
118
- addr_and_mmu_idx = addr & TARGET_PAGE_MASK;
119
- addr_and_mmu_idx |= idxmap;
120
+ addr &= TARGET_PAGE_MASK;
121
122
- if (!qemu_cpu_is_self(cpu)) {
123
- async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_work,
124
- RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
125
+ if (qemu_cpu_is_self(cpu)) {
126
+ tlb_flush_page_by_mmuidx_async_0(cpu, addr, idxmap);
127
+ } else if (idxmap < TARGET_PAGE_SIZE) {
128
+ /*
129
+ * Most targets have only a few mmu_idx. In the case where
130
+ * we can stuff idxmap into the low TARGET_PAGE_BITS, avoid
131
+ * allocating memory for this operation.
132
+ */
133
+ async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_1,
134
+ RUN_ON_CPU_TARGET_PTR(addr | idxmap));
135
} else {
136
- tlb_flush_page_by_mmuidx_async_work(
137
- cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
138
+ TLBFlushPageByMMUIdxData *d = g_new(TLBFlushPageByMMUIdxData, 1);
139
+
140
+ /* Otherwise allocate a structure, freed by the worker. */
141
+ d->addr = addr;
142
+ d->idxmap = idxmap;
143
+ async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_2,
144
+ RUN_ON_CPU_HOST_PTR(d));
145
}
146
}
147
148
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
149
void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
150
uint16_t idxmap)
151
{
152
- const run_on_cpu_func fn = tlb_flush_page_by_mmuidx_async_work;
153
- target_ulong addr_and_mmu_idx;
154
-
155
tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
156
157
/* This should already be page aligned */
158
- addr_and_mmu_idx = addr & TARGET_PAGE_MASK;
159
- addr_and_mmu_idx |= idxmap;
160
+ addr &= TARGET_PAGE_MASK;
161
162
- flush_all_helper(src_cpu, fn, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
163
- fn(src_cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
164
+ /*
165
+ * Allocate memory to hold addr+idxmap only when needed.
166
+ * See tlb_flush_page_by_mmuidx for details.
167
+ */
168
+ if (idxmap < TARGET_PAGE_SIZE) {
169
+ flush_all_helper(src_cpu, tlb_flush_page_by_mmuidx_async_1,
170
+ RUN_ON_CPU_TARGET_PTR(addr | idxmap));
171
+ } else {
172
+ CPUState *dst_cpu;
173
+
174
+ /* Allocate a separate data block for each destination cpu. */
175
+ CPU_FOREACH(dst_cpu) {
176
+ if (dst_cpu != src_cpu) {
177
+ TLBFlushPageByMMUIdxData *d
178
+ = g_new(TLBFlushPageByMMUIdxData, 1);
179
+
180
+ d->addr = addr;
181
+ d->idxmap = idxmap;
182
+ async_run_on_cpu(dst_cpu, tlb_flush_page_by_mmuidx_async_2,
183
+ RUN_ON_CPU_HOST_PTR(d));
184
+ }
185
+ }
186
+ }
187
+
188
+ tlb_flush_page_by_mmuidx_async_0(src_cpu, addr, idxmap);
189
}
190
191
void tlb_flush_page_all_cpus(CPUState *src, target_ulong addr)
192
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
193
target_ulong addr,
194
uint16_t idxmap)
195
{
196
- const run_on_cpu_func fn = tlb_flush_page_by_mmuidx_async_work;
197
- target_ulong addr_and_mmu_idx;
198
-
199
tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
200
201
/* This should already be page aligned */
202
- addr_and_mmu_idx = addr & TARGET_PAGE_MASK;
203
- addr_and_mmu_idx |= idxmap;
204
+ addr &= TARGET_PAGE_MASK;
205
206
- flush_all_helper(src_cpu, fn, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
207
- async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
208
+ /*
209
+ * Allocate memory to hold addr+idxmap only when needed.
210
+ * See tlb_flush_page_by_mmuidx for details.
211
+ */
212
+ if (idxmap < TARGET_PAGE_SIZE) {
213
+ flush_all_helper(src_cpu, tlb_flush_page_by_mmuidx_async_1,
214
+ RUN_ON_CPU_TARGET_PTR(addr | idxmap));
215
+ async_safe_run_on_cpu(src_cpu, tlb_flush_page_by_mmuidx_async_1,
216
+ RUN_ON_CPU_TARGET_PTR(addr | idxmap));
217
+ } else {
218
+ CPUState *dst_cpu;
219
+ TLBFlushPageByMMUIdxData *d;
220
+
221
+ /* Allocate a separate data block for each destination cpu. */
222
+ CPU_FOREACH(dst_cpu) {
223
+ if (dst_cpu != src_cpu) {
224
+ d = g_new(TLBFlushPageByMMUIdxData, 1);
225
+ d->addr = addr;
226
+ d->idxmap = idxmap;
227
+ async_run_on_cpu(dst_cpu, tlb_flush_page_by_mmuidx_async_2,
228
+ RUN_ON_CPU_HOST_PTR(d));
229
+ }
230
+ }
231
+
232
+ d = g_new(TLBFlushPageByMMUIdxData, 1);
233
+ d->addr = addr;
234
+ d->idxmap = idxmap;
235
+ async_safe_run_on_cpu(src_cpu, tlb_flush_page_by_mmuidx_async_2,
236
+ RUN_ON_CPU_HOST_PTR(d));
237
+ }
238
}
239
240
void tlb_flush_page_all_cpus_synced(CPUState *src, target_ulong addr)
241
--
56
--
242
2.20.1
57
2.25.1
243
58
244
59
diff view generated by jsdifflib
1
The accel_initialised variable no longer has any setters.
1
At some point during the development of tcg_constant_*, I changed
2
my mind about whether such temps should be able to be passed to
3
tcg_temp_free_*. The final version committed allows this, but the
4
commentary was not updated to match.
2
5
3
Fixes: 6f6e1698a68c
6
Fixes: c0522136adf
4
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
7
Reported-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
8
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
7
Reviewed by: Aleksandar Markovic <amarkovic@wavecomp.com>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
10
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
---
11
---
10
vl.c | 3 +--
12
include/tcg/tcg.h | 3 ++-
11
1 file changed, 1 insertion(+), 2 deletions(-)
13
1 file changed, 2 insertions(+), 1 deletion(-)
12
14
13
diff --git a/vl.c b/vl.c
15
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
14
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
15
--- a/vl.c
17
--- a/include/tcg/tcg.h
16
+++ b/vl.c
18
+++ b/include/tcg/tcg.h
17
@@ -XXX,XX +XXX,XX @@ static void configure_accelerators(const char *progname)
19
@@ -XXX,XX +XXX,XX @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec);
18
{
20
19
const char *accel;
21
/*
20
char **accel_list, **tmp;
22
* Locate or create a read-only temporary that is a constant.
21
- bool accel_initialised = false;
23
- * This kind of temporary need not and should not be freed.
22
bool init_failed = false;
24
+ * This kind of temporary need not be freed, but for convenience
23
25
+ * will be silently ignored by tcg_temp_free_*.
24
qemu_opts_foreach(qemu_find_opts("icount"),
26
*/
25
@@ -XXX,XX +XXX,XX @@ static void configure_accelerators(const char *progname)
27
TCGTemp *tcg_constant_internal(TCGType type, int64_t val);
26
28
27
accel_list = g_strsplit(accel, ":", 0);
28
29
- for (tmp = accel_list; !accel_initialised && tmp && *tmp; tmp++) {
30
+ for (tmp = accel_list; tmp && *tmp; tmp++) {
31
/*
32
* Filter invalid accelerators here, to prevent obscenities
33
* such as "-machine accel=tcg,,thread=single".
34
--
29
--
35
2.20.1
30
2.25.1
36
31
37
32
diff view generated by jsdifflib
1
The accel_list and tmp variables are only used when manufacturing
1
From: "Jose R. Ziviani" <jziviani@suse.de>
2
-machine accel, options based on -accel.
3
2
4
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
3
Commit 5e8892db93 fixed several function signatures but tcg_out_op for
5
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
4
arm is missing. This patch fixes it as well.
6
Reviewed by: Aleksandar Markovic <amarkovic@wavecomp.com>
5
6
Signed-off-by: Jose R. Ziviani <jziviani@suse.de>
7
Message-Id: <20210610224450.23425-1-jziviani@suse.de>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
---
9
vl.c | 3 ++-
10
tcg/arm/tcg-target.c.inc | 3 ++-
10
1 file changed, 2 insertions(+), 1 deletion(-)
11
1 file changed, 2 insertions(+), 1 deletion(-)
11
12
12
diff --git a/vl.c b/vl.c
13
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
14
--- a/vl.c
15
--- a/tcg/arm/tcg-target.c.inc
15
+++ b/vl.c
16
+++ b/tcg/arm/tcg-target.c.inc
16
@@ -XXX,XX +XXX,XX @@ static int do_configure_accelerator(void *opaque, QemuOpts *opts, Error **errp)
17
@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
17
static void configure_accelerators(const char *progname)
18
static void tcg_out_epilogue(TCGContext *s);
19
20
static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
21
- const TCGArg *args, const int *const_args)
22
+ const TCGArg args[TCG_MAX_OP_ARGS],
23
+ const int const_args[TCG_MAX_OP_ARGS])
18
{
24
{
19
const char *accel;
25
TCGArg a0, a1, a2, a3, a4, a5;
20
- char **accel_list, **tmp;
26
int c;
21
bool init_failed = false;
22
23
qemu_opts_foreach(qemu_find_opts("icount"),
24
@@ -XXX,XX +XXX,XX @@ static void configure_accelerators(const char *progname)
25
26
accel = qemu_opt_get(qemu_get_machine_opts(), "accel");
27
if (QTAILQ_EMPTY(&qemu_accel_opts.head)) {
28
+ char **accel_list, **tmp;
29
+
30
if (accel == NULL) {
31
/* Select the default accelerator */
32
if (!accel_find("tcg") && !accel_find("kvm")) {
33
--
27
--
34
2.20.1
28
2.25.1
35
29
36
30
diff view generated by jsdifflib
1
The result of g_strsplit is never NULL.
1
Typo in the conversion to FloatParts64.
2
2
3
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
3
Fixes: 572c4d862ff2
4
Fixes: Coverity CID 1457457
5
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
5
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
Message-Id: <20210607223812.110596-1-richard.henderson@linaro.org>
6
Reviewed by: Aleksandar Markovic <amarkovic@wavecomp.com>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
---
9
vl.c | 2 +-
10
fpu/softfloat.c | 2 +-
10
1 file changed, 1 insertion(+), 1 deletion(-)
11
1 file changed, 1 insertion(+), 1 deletion(-)
11
12
12
diff --git a/vl.c b/vl.c
13
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
14
--- a/vl.c
15
--- a/fpu/softfloat.c
15
+++ b/vl.c
16
+++ b/fpu/softfloat.c
16
@@ -XXX,XX +XXX,XX @@ static void configure_accelerators(const char *progname)
17
@@ -XXX,XX +XXX,XX @@ float32 float32_exp2(float32 a, float_status *status)
17
18
18
accel_list = g_strsplit(accel, ":", 0);
19
float_raise(float_flag_inexact, status);
19
20
20
- for (tmp = accel_list; tmp && *tmp; tmp++) {
21
- float64_unpack_canonical(&xnp, float64_ln2, status);
21
+ for (tmp = accel_list; *tmp; tmp++) {
22
+ float64_unpack_canonical(&tp, float64_ln2, status);
22
/*
23
xp = *parts_mul(&xp, &tp, status);
23
* Filter invalid accelerators here, to prevent obscenities
24
xnp = xp;
24
* such as "-machine accel=tcg,,thread=single".
25
25
--
26
--
26
2.20.1
27
2.25.1
27
28
28
29
diff view generated by jsdifflib
1
From: Carlos Santos <casantos@redhat.com>
1
From: Luis Pires <luis.pires@eldorado.org.br>
2
2
3
uClibc defines _SC_LEVEL1_ICACHE_LINESIZE and _SC_LEVEL1_DCACHE_LINESIZE
3
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
4
but the corresponding sysconf calls returns -1, which is a valid result,
4
Message-Id: <20210601125143.191165-1-luis.pires@eldorado.org.br>
5
meaning that the limit is indeterminate.
6
7
Handle this situation using the fallback values instead of crashing due
8
to an assertion failure.
9
10
Signed-off-by: Carlos Santos <casantos@redhat.com>
11
Message-Id: <20191017123713.30192-1-casantos@redhat.com>
12
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
13
---
6
---
14
util/cacheinfo.c | 10 ++++++++--
7
docs/devel/tcg.rst | 101 ++++++++++++++++++++++++++++++++++++++++-----
15
1 file changed, 8 insertions(+), 2 deletions(-)
8
1 file changed, 90 insertions(+), 11 deletions(-)
16
9
17
diff --git a/util/cacheinfo.c b/util/cacheinfo.c
10
diff --git a/docs/devel/tcg.rst b/docs/devel/tcg.rst
18
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
19
--- a/util/cacheinfo.c
12
--- a/docs/devel/tcg.rst
20
+++ b/util/cacheinfo.c
13
+++ b/docs/devel/tcg.rst
21
@@ -XXX,XX +XXX,XX @@ static void sys_cache_info(int *isize, int *dsize)
14
@@ -XXX,XX +XXX,XX @@ performances.
22
static void sys_cache_info(int *isize, int *dsize)
15
QEMU's dynamic translation backend is called TCG, for "Tiny Code
23
{
16
Generator". For more information, please take a look at ``tcg/README``.
24
# ifdef _SC_LEVEL1_ICACHE_LINESIZE
17
25
- *isize = sysconf(_SC_LEVEL1_ICACHE_LINESIZE);
18
-Some notable features of QEMU's dynamic translator are:
26
+ int tmp_isize = (int) sysconf(_SC_LEVEL1_ICACHE_LINESIZE);
19
+The following sections outline some notable features and implementation
27
+ if (tmp_isize > 0) {
20
+details of QEMU's dynamic translator.
28
+ *isize = tmp_isize;
21
29
+ }
22
CPU state optimisations
30
# endif
23
-----------------------
31
# ifdef _SC_LEVEL1_DCACHE_LINESIZE
24
32
- *dsize = sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
25
-The target CPUs have many internal states which change the way it
33
+ int tmp_dsize = (int) sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
26
-evaluates instructions. In order to achieve a good speed, the
34
+ if (tmp_dsize > 0) {
27
+The target CPUs have many internal states which change the way they
35
+ *dsize = tmp_dsize;
28
+evaluate instructions. In order to achieve a good speed, the
36
+ }
29
translation phase considers that some state information of the virtual
37
# endif
30
CPU cannot change in it. The state is recorded in the Translation
38
}
31
Block (TB). If the state changes (e.g. privilege level), a new TB will
39
#endif /* sys_cache_info */
32
@@ -XXX,XX +XXX,XX @@ Direct block chaining
33
---------------------
34
35
After each translated basic block is executed, QEMU uses the simulated
36
-Program Counter (PC) and other cpu state information (such as the CS
37
+Program Counter (PC) and other CPU state information (such as the CS
38
segment base value) to find the next basic block.
39
40
-In order to accelerate the most common cases where the new simulated PC
41
-is known, QEMU can patch a basic block so that it jumps directly to the
42
-next one.
43
+In its simplest, less optimized form, this is done by exiting from the
44
+current TB, going through the TB epilogue, and then back to the
45
+main loop. That’s where QEMU looks for the next TB to execute,
46
+translating it from the guest architecture if it isn’t already available
47
+in memory. Then QEMU proceeds to execute this next TB, starting at the
48
+prologue and then moving on to the translated instructions.
49
50
-The most portable code uses an indirect jump. An indirect jump makes
51
-it easier to make the jump target modification atomic. On some host
52
-architectures (such as x86 or PowerPC), the ``JUMP`` opcode is
53
-directly patched so that the block chaining has no overhead.
54
+Exiting from the TB this way will cause the ``cpu_exec_interrupt()``
55
+callback to be re-evaluated before executing additional instructions.
56
+It is mandatory to exit this way after any CPU state changes that may
57
+unmask interrupts.
58
+
59
+In order to accelerate the cases where the TB for the new
60
+simulated PC is already available, QEMU has mechanisms that allow
61
+multiple TBs to be chained directly, without having to go back to the
62
+main loop as described above. These mechanisms are:
63
+
64
+``lookup_and_goto_ptr``
65
+^^^^^^^^^^^^^^^^^^^^^^^
66
+
67
+Calling ``tcg_gen_lookup_and_goto_ptr()`` will emit a call to
68
+``helper_lookup_tb_ptr``. This helper will look for an existing TB that
69
+matches the current CPU state. If the destination TB is available its
70
+code address is returned, otherwise the address of the JIT epilogue is
71
+returned. The call to the helper is always followed by the tcg ``goto_ptr``
72
+opcode, which branches to the returned address. In this way, we either
73
+branch to the next TB or return to the main loop.
74
+
75
+``goto_tb + exit_tb``
76
+^^^^^^^^^^^^^^^^^^^^^
77
+
78
+The translation code usually implements branching by performing the
79
+following steps:
80
+
81
+1. Call ``tcg_gen_goto_tb()`` passing a jump slot index (either 0 or 1)
82
+ as a parameter.
83
+
84
+2. Emit TCG instructions to update the CPU state with any information
85
+ that has been assumed constant and is required by the main loop to
86
+ correctly locate and execute the next TB. For most guests, this is
87
+ just the PC of the branch destination, but others may store additional
88
+ data. The information updated in this step must be inferable from both
89
+ ``cpu_get_tb_cpu_state()`` and ``cpu_restore_state()``.
90
+
91
+3. Call ``tcg_gen_exit_tb()`` passing the address of the current TB and
92
+ the jump slot index again.
93
+
94
+Step 1, ``tcg_gen_goto_tb()``, will emit a ``goto_tb`` TCG
95
+instruction that later on gets translated to a jump to an address
96
+associated with the specified jump slot. Initially, this is the address
97
+of step 2's instructions, which update the CPU state information. Step 3,
98
+``tcg_gen_exit_tb()``, exits from the current TB returning a tagged
99
+pointer composed of the last executed TB’s address and the jump slot
100
+index.
101
+
102
+The first time this whole sequence is executed, step 1 simply jumps
103
+to step 2. Then the CPU state information gets updated and we exit from
104
+the current TB. As a result, the behavior is very similar to the less
105
+optimized form described earlier in this section.
106
+
107
+Next, the main loop looks for the next TB to execute using the
108
+current CPU state information (creating the TB if it wasn’t already
109
+available) and, before starting to execute the new TB’s instructions,
110
+patches the previously executed TB by associating one of its jump
111
+slots (the one specified in the call to ``tcg_gen_exit_tb()``) with the
112
+address of the new TB.
113
+
114
+The next time this previous TB is executed and we get to that same
115
+``goto_tb`` step, it will already be patched (assuming the destination TB
116
+is still in memory) and will jump directly to the first instruction of
117
+the destination TB, without going back to the main loop.
118
+
119
+For the ``goto_tb + exit_tb`` mechanism to be used, the following
120
+conditions need to be satisfied:
121
+
122
+* The change in CPU state must be constant, e.g., a direct branch and
123
+ not an indirect branch.
124
+
125
+* The direct branch cannot cross a page boundary. Memory mappings
126
+ may change, causing the code at the destination address to change.
127
+
128
+Note that, on step 3 (``tcg_gen_exit_tb()``), in addition to the
129
+jump slot index, the address of the TB just executed is also returned.
130
+This address corresponds to the TB that will be patched; it may be
131
+different than the one that was directly executed from the main loop
132
+if the latter had already been chained to other TBs.
133
134
Self-modifying code and translated code invalidation
135
----------------------------------------------------
40
--
136
--
41
2.20.1
137
2.25.1
42
138
43
139
diff view generated by jsdifflib