Support of Virtual CPU Hotplug-like Feature for ARMv8+ Arch

[PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc

Posted by salil.mehta@opnsrc.net 4 months, 1 week ago

From: Salil Mehta <salil.mehta@huawei.com>

The TCG code cache is split into regions shared by vCPUs under MTTCG. For
cold-boot (early realized) vCPUs, regions are sized/allocated during bring-up.
However, when a vCPU is *lazy_realized* (administratively "disabled" at boot
and realized later on demand), its TCGContext may fail the very first code
region allocation if the shared TB cache is saturated by already-running
vCPUs.

Flushing the TB cache is the right remediation, but `tb_flush()` must be
performed from the safe execution context (cpu_exec_loop()/tb_gen_code()).
This patch wires a deferred flush:

  * In `tcg_region_initial_alloc__locked()`, treat an initial allocation
    failure for a lazily realized vCPU as non-fatal: set `s->tbflush_pend`
    and return.

  * In `tcg_tb_alloc()`, if `s->tbflush_pend` is observed, clear it and
    return NULL so the caller performs a synchronous `tb_flush()` and then
    retries allocation.

This avoids hangs observed when a newly realized vCPU cannot obtain its first
region under TB-cache pressure, while keeping the flush at a safe point.

No change for cold-boot vCPUs and when accel ops is KVM.

In earlier series, this patch was with below named,
'tcg: Update tcg_register_thread() leg to handle region alloc for hotplugged vCPU'

Reported-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 accel/tcg/tcg-accel-ops-mttcg.c |  2 +-
 accel/tcg/tcg-accel-ops-rr.c    |  2 +-
 hw/arm/virt.c                   |  5 +++++
 include/hw/core/cpu.h           |  1 +
 include/tcg/startup.h           |  6 ++++++
 include/tcg/tcg.h               |  1 +
 tcg/region.c                    | 16 ++++++++++++++++
 tcg/tcg.c                       | 19 ++++++++++++++++++-
 8 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index 337b993d3d..cdb7345340 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -73,7 +73,7 @@ static void *mttcg_cpu_thread_fn(void *arg)
     force_rcu.notifier.notify = mttcg_force_rcu;
     force_rcu.cpu = cpu;
     rcu_add_force_rcu_notifier(&force_rcu.notifier);
-    tcg_register_thread();
+    tcg_register_thread(cpu);
 
     bql_lock();
     qemu_thread_get_self(cpu->thread);
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 6eec5c9eee..18e713cada 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -186,7 +186,7 @@ static void *rr_cpu_thread_fn(void *arg)
     rcu_register_thread();
     force_rcu.notify = rr_force_rcu;
     rcu_add_force_rcu_notifier(&force_rcu);
-    tcg_register_thread();
+    tcg_register_thread(cpu);
 
     bql_lock();
     qemu_thread_get_self(cpu->thread);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5e02d6749d..254303727b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2482,6 +2482,11 @@ virt_setup_lazy_vcpu_realization(Object *cpuobj, VirtMachineState *vms)
     if (kvm_enabled()) {
         kvm_arm_create_host_vcpu(ARM_CPU(cpuobj));
     }
+
+    /* we may have to nuke the TB cache */
+    if (tcg_enabled()) {
+        CPU(cpuobj)->lazy_realized = true;
+    }
 }
 
 static void machvirt_init(MachineState *machine)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index c9ce9bbdaf..c2d45fb494 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -486,6 +486,7 @@ struct CPUState {
     bool stop;
     bool stopped;
     bool parked;
+    bool lazy_realized; /* realized after machine init (lazy realization) */
 
     /* Should CPU start in powered-off state? */
     bool start_powered_off;
diff --git a/include/tcg/startup.h b/include/tcg/startup.h
index 95f574af2b..f9126bb0bd 100644
--- a/include/tcg/startup.h
+++ b/include/tcg/startup.h
@@ -25,6 +25,8 @@
 #ifndef TCG_STARTUP_H
 #define TCG_STARTUP_H
 
+#include "hw/core/cpu.h"
+
 /**
  * tcg_init: Initialize the TCG runtime
  * @tb_size: translation buffer size
@@ -43,7 +45,11 @@ void tcg_init(size_t tb_size, int splitwx, unsigned max_threads);
  * accelerator's init_machine() method) must register with this
  * function before initiating translation.
  */
+#ifdef CONFIG_USER_ONLY
 void tcg_register_thread(void);
+#else
+void tcg_register_thread(CPUState *cpu);
+#endif
 
 /**
  * tcg_prologue_init(): Generate the code for the TCG prologue
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index a6d9aa50d4..e197ee03c0 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -396,6 +396,7 @@ struct TCGContext {
 
     /* Track which vCPU triggers events */
     CPUState *cpu;                      /* *_trans */
+    bool tbflush_pend; /* TB flush pending due to lazy vCPU realization */
 
     /* These structures are private to tcg-target.c.inc.  */
     QSIMPLEQ_HEAD(, TCGLabelQemuLdst) ldst_labels;
diff --git a/tcg/region.c b/tcg/region.c
index 7ea0b37a84..23635e0194 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -393,6 +393,22 @@ bool tcg_region_alloc(TCGContext *s)
 static void tcg_region_initial_alloc__locked(TCGContext *s)
 {
     bool err = tcg_region_alloc__locked(s);
+
+    /*
+     * Lazily realized vCPUs (administratively "disabled" at boot and realized
+     * later on demand) may initially fail to obtain even a single code region
+     * if the shared TB cache is under pressure from already running vCPUs.
+     *
+     * Treat this first-allocation failure as non-fatal: mark this TCGContext
+     * to request a TB cache flush and return. The flush is performed later,
+     * synchronously in the vCPU execution path (cpu_exec_loop()/tb_gen_code()),
+     * which is the safe place for tb_flush().
+     */
+    if (err && s->cpu && s->cpu->lazy_realized) {
+        s->tbflush_pend = true;
+        return;
+    }
+
     g_assert(!err);
 }
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index afac55a203..5867952ae7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1285,12 +1285,14 @@ void tcg_register_thread(void)
     tcg_ctx = &tcg_init_ctx;
 }
 #else
-void tcg_register_thread(void)
+void tcg_register_thread(CPUState *cpu)
 {
     TCGContext *s = g_malloc(sizeof(*s));
     unsigned int i, n;
 
     *s = tcg_init_ctx;
+     s->cpu = cpu;
+     s->tbflush_pend = false;
 
     /* Relink mem_base.  */
     for (i = 0, n = tcg_init_ctx.nb_globals; i < n; ++i) {
@@ -1871,6 +1873,21 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
     TranslationBlock *tb;
     void *next;
 
+    /*
+     * Lazy realization:
+     * A vCPU that was realized after machine init may have failed its first
+     * code-region allocation (see tcg_region_initial_alloc__locked()) and
+     * requested a deferred TB-cache flush by setting s->tbflush_pend.
+     *
+     * If the flag is set, do not attempt allocation here. Clear the flag and
+     * return NULL so the caller (tb_gen_code()/cpu_exec_loop()) can perform a
+     * safe tb_flush() and then retry TB allocation.
+     */
+    if (s->tbflush_pend) {
+        s->tbflush_pend = false;
+        return NULL;
+    }
+
  retry:
     tb = (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, align);
     next = (void *)ROUND_UP((uintptr_t)(tb + 1), align);
-- 
2.34.1

Re: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc

Posted by Richard Henderson 4 months, 1 week ago

On 9/30/25 18:01, salil.mehta@opnsrc.net wrote:
> From: Salil Mehta <salil.mehta@huawei.com>
> 
> The TCG code cache is split into regions shared by vCPUs under MTTCG. For
> cold-boot (early realized) vCPUs, regions are sized/allocated during bring-up.
> However, when a vCPU is *lazy_realized* (administratively "disabled" at boot
> and realized later on demand), its TCGContext may fail the very first code
> region allocation if the shared TB cache is saturated by already-running
> vCPUs.
> 
> Flushing the TB cache is the right remediation, but `tb_flush()` must be
> performed from the safe execution context (cpu_exec_loop()/tb_gen_code()).
> This patch wires a deferred flush:
> 
>    * In `tcg_region_initial_alloc__locked()`, treat an initial allocation
>      failure for a lazily realized vCPU as non-fatal: set `s->tbflush_pend`
>      and return.
> 
>    * In `tcg_tb_alloc()`, if `s->tbflush_pend` is observed, clear it and
>      return NULL so the caller performs a synchronous `tb_flush()` and then
>      retries allocation.
> 
> This avoids hangs observed when a newly realized vCPU cannot obtain its first
> region under TB-cache pressure, while keeping the flush at a safe point.
> 
> No change for cold-boot vCPUs and when accel ops is KVM.
> 
> In earlier series, this patch was with below named,
> 'tcg: Update tcg_register_thread() leg to handle region alloc for hotplugged vCPU'


I don't see why you need two different booleans for this. 	
It seems to me that you could create the cpu in a state for which the first call to 
tcg_tb_alloc() sees highwater state, and everything after that happens per usual 
allocating a new region, and possibly flushing the full buffer.

What is the testcase for this?


r~

RE: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc

Posted by Salil Mehta via 4 months, 1 week ago

Hi Richard,

Thanks for the reply. Please find my response inline.

Cheers.

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Richard
> Henderson
> Sent: Wednesday, October 1, 2025 10:34 PM
> To: salil.mehta@opnsrc.net; qemu-devel@nongnu.org; qemu-
> arm@nongnu.org; mst@redhat.com
> Subject: Re: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs
> on first region alloc
> 
> On 9/30/25 18:01, salil.mehta@opnsrc.net wrote:
> > From: Salil Mehta <salil.mehta@huawei.com>
> >
> > The TCG code cache is split into regions shared by vCPUs under MTTCG.
> > For cold-boot (early realized) vCPUs, regions are sized/allocated during
> bring-up.
> > However, when a vCPU is *lazy_realized* (administratively "disabled"
> > at boot and realized later on demand), its TCGContext may fail the
> > very first code region allocation if the shared TB cache is saturated
> > by already-running vCPUs.
> >
> > Flushing the TB cache is the right remediation, but `tb_flush()` must
> > be performed from the safe execution context
> (cpu_exec_loop()/tb_gen_code()).
> > This patch wires a deferred flush:
> >
> >    * In `tcg_region_initial_alloc__locked()`, treat an initial allocation
> >      failure for a lazily realized vCPU as non-fatal: set `s->tbflush_pend`
> >      and return.
> >
> >    * In `tcg_tb_alloc()`, if `s->tbflush_pend` is observed, clear it and
> >      return NULL so the caller performs a synchronous `tb_flush()` and then
> >      retries allocation.
> >
> > This avoids hangs observed when a newly realized vCPU cannot obtain
> > its first region under TB-cache pressure, while keeping the flush at a safe
> point.
> >
> > No change for cold-boot vCPUs and when accel ops is KVM.
> >
> > In earlier series, this patch was with below named,
> > 'tcg: Update tcg_register_thread() leg to handle region alloc for hotplugged
> vCPU'
> 
> 
> I don't see why you need two different booleans for this.

I can see your point. Maybe I can move `s->tbflush_pend`  to 'CPUState' instead? 

> It seems to me that you could create the cpu in a state for which the first call
> to
> tcg_tb_alloc() sees highwater state, and everything after that happens per
> usual allocating a new region, and possibly flushing the full buffer.

Correct. but with a distinction that highwater state is relevant to a TCGContext
and the regions are allocated from a common pool 'Code Generation Buffer'.
'code_gen_highwater' is use to detect whether current context needs more
region allocation for the dynamic translation to continue. This is a different
condition than what we are encountering; which is the worst case condition
that the entire code generation buffer is saturated and cannot even allocate
a single free TCG region successfully. In such a case, we do not have any option
than to flush the entire buffer and reallocate the regions to all the threads.
A rebalancing act to accommodate a new vCPU - which is expensive but the
good thing is this does not happens every time and is a worst case condition
i.e. when a system is under tremendous stress and is running out of resources. 

We are avoiding this crash:

ERROR:../tcg/region.c:396:tcg_region_initial_alloc__locked: assertion failed: (!err)
Bail out! ERROR:../tcg/region.c:396:tcg_region_initial_alloc__locked: assertion failed: (!err)
./run-qemu.sh: line 8: 255346 Aborted                 
(core dumped) ./qemu/build/qemu-system-aarch64 -M virt,accel=tcg

Dump is here:

Thread 65 "qemu-system-aar" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff48ff9640 (LWP 633577)]
0x00007ffff782f98c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff782f98c in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff77e2646 in raise () at /lib64/libc.so.6
#2  0x00007ffff77cc7f3 in abort () at /lib64/libc.so.6
#3  0x00007ffff7c21d6c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0
#4  0x00007ffff7c7ce2f in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
#5  0x00005555561cf359 in tcg_region_initial_alloc__locked (s=0x7fff10000b60) at ../tcg/region.c:396
#6  0x00005555561cf3ab in tcg_region_initial_alloc (s=0x7fff10000b60) at ../tcg/region.c:402
#7  0x00005555561da83c in tcg_register_thread () at ../tcg/tcg.c:820
#8  0x00005555561a97bb in mttcg_cpu_thread_fn (arg=0x555557e0c2b0) at ../accel/tcg/tcg-accel-ops-mttcg.c:77
#9  0x00005555564f18ab in qemu_thread_start (args=0x5555582e2bc0) at ../util/qemu-thread-posix.c:541
#10 0x00007ffff782dc12 in start_thread () at /lib64/libc.so.6
#11 0x00007ffff78b2cc0 in clone3 () at /lib64/libc.so.6
(gdb)

> 
> What is the testcase for this?

As mentioned, tackling a worst case when 'code generation buffer' runs out
of space totally. We need a better mitigation plan that to simply assert().

Can be easily reproducible by decreasing the 'tb_size'  and increasing the 
number of vCPUs, and having larger programs running simultaneously.
I was able to reproduce it with only 6 vCPUs and with 'tb_size=10'.
Booting was dead slow but with a single vCPU hotplug action we can
 reproduce it.

RFC V6 has TCG broken for some other reason and I'm trying to fix it.
But if you wish you can try this on RFC 5 which has greater chances of
this happening as it actually uses vCPU hotplug approach i.e. threads
can be created and deleted.

https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v5/

With RFC V6 this condition is likely to happen only once during delayed
spawning of the vCPU thread of a VCPU being lazily realized. We do not
delete the spawned thread.

Many thanks!

Best regards
Salil.

> 
> 
> r~

Re: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc

Posted by Richard Henderson 4 months, 1 week ago

On 10/2/25 05:27, Salil Mehta wrote:
> Hi Richard,
> 
> Thanks for the reply. Please find my response inline.
> 
> Cheers.
> 
>> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
>> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Richard
>> Henderson
>> Sent: Wednesday, October 1, 2025 10:34 PM
>> To: salil.mehta@opnsrc.net; qemu-devel@nongnu.org; qemu-
>> arm@nongnu.org; mst@redhat.com
>> Subject: Re: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs
>> on first region alloc
>>
>> On 9/30/25 18:01, salil.mehta@opnsrc.net wrote:
>>> From: Salil Mehta <salil.mehta@huawei.com>
>>>
>>> The TCG code cache is split into regions shared by vCPUs under MTTCG.
>>> For cold-boot (early realized) vCPUs, regions are sized/allocated during
>> bring-up.
>>> However, when a vCPU is *lazy_realized* (administratively "disabled"
>>> at boot and realized later on demand), its TCGContext may fail the
>>> very first code region allocation if the shared TB cache is saturated
>>> by already-running vCPUs.
>>>
>>> Flushing the TB cache is the right remediation, but `tb_flush()` must
>>> be performed from the safe execution context
>> (cpu_exec_loop()/tb_gen_code()).
>>> This patch wires a deferred flush:
>>>
>>>     * In `tcg_region_initial_alloc__locked()`, treat an initial allocation
>>>       failure for a lazily realized vCPU as non-fatal: set `s->tbflush_pend`
>>>       and return.
>>>
>>>     * In `tcg_tb_alloc()`, if `s->tbflush_pend` is observed, clear it and
>>>       return NULL so the caller performs a synchronous `tb_flush()` and then
>>>       retries allocation.
>>>
>>> This avoids hangs observed when a newly realized vCPU cannot obtain
>>> its first region under TB-cache pressure, while keeping the flush at a safe
>> point.
>>>
>>> No change for cold-boot vCPUs and when accel ops is KVM.
>>>
>>> In earlier series, this patch was with below named,
>>> 'tcg: Update tcg_register_thread() leg to handle region alloc for hotplugged
>> vCPU'
>>
>>
>> I don't see why you need two different booleans for this.
> 
> 
> I can see your point. Maybe I can move `s->tbflush_pend`  to 'CPUState' instead?
> 
> 
>> It seems to me that you could create the cpu in a state for which the first call
>> to
>> tcg_tb_alloc() sees highwater state, and everything after that happens per
>> usual allocating a new region, and possibly flushing the full buffer.
> 
> 
> Correct. but with a distinction that highwater state is relevant to a TCGContext
> and the regions are allocated from a common pool 'Code Generation Buffer'.
> 'code_gen_highwater' is use to detect whether current context needs more
> region allocation for the dynamic translation to continue. This is a different
> condition than what we are encountering; which is the worst case condition
> that the entire code generation buffer is saturated and cannot even allocate
> a single free TCG region successfully.

I think you misunderstand "and everything after that happens per usual".

When allocating a tb, if a cpu finds that it's current region is full, then it tries to 
allocate another region.  If that is not successful, then we flush the entire 
code_gen_buffer and try again.

Thus tbflush_pend is exactly equivalent to setting

     s->code_gen_ptr > s->code_gen_highwater.

As far as lazy_realized...  The utility of the assert under these conditions may be called 
into question; we could just remove it.


r~

RE: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc

Posted by Salil Mehta via 4 months ago

Hi Richard,

Sorry for the delay in reply. 

> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Thursday, October 2, 2025 4:41 PM
> 
> On 10/2/25 05:27, Salil Mehta wrote:
> > Hi Richard,
> >
> > Thanks for the reply. Please find my response inline.
> >
> > Cheers.
> >
> >> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org
> <qemu-
> >> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of
> Richard
> >> Henderson
> >> Sent: Wednesday, October 1, 2025 10:34 PM
> >> To: salil.mehta@opnsrc.net; qemu-devel@nongnu.org; qemu-
> >> arm@nongnu.org; mst@redhat.com
> >> Subject: Re: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy
> >> realized' vCPUs on first region alloc
> >>
> >> On 9/30/25 18:01, salil.mehta@opnsrc.net wrote:
> >>> From: Salil Mehta <salil.mehta@huawei.com>
> >>>
> >>> The TCG code cache is split into regions shared by vCPUs under MTTCG.
> >>> For cold-boot (early realized) vCPUs, regions are sized/allocated
> >>> during
> >> bring-up.
> >>> However, when a vCPU is *lazy_realized* (administratively "disabled"
> >>> at boot and realized later on demand), its TCGContext may fail the
> >>> very first code region allocation if the shared TB cache is
> >>> saturated by already-running vCPUs.
> >>>
> >>> Flushing the TB cache is the right remediation, but `tb_flush()`
> >>> must be performed from the safe execution context
> >> (cpu_exec_loop()/tb_gen_code()).
> >>> This patch wires a deferred flush:
> >>>
> >>>     * In `tcg_region_initial_alloc__locked()`, treat an initial allocation
> >>>       failure for a lazily realized vCPU as non-fatal: set `s->tbflush_pend`
> >>>       and return.
> >>>
> >>>     * In `tcg_tb_alloc()`, if `s->tbflush_pend` is observed, clear it and
> >>>       return NULL so the caller performs a synchronous `tb_flush()` and
> then
> >>>       retries allocation.
> >>>
> >>> This avoids hangs observed when a newly realized vCPU cannot obtain
> >>> its first region under TB-cache pressure, while keeping the flush at
> >>> a safe
> >> point.
> >>>
> >>> No change for cold-boot vCPUs and when accel ops is KVM.
> >>>
> >>> In earlier series, this patch was with below named,
> >>> 'tcg: Update tcg_register_thread() leg to handle region alloc for
> >>> hotplugged
> >> vCPU'
> >>
> >>
> >> I don't see why you need two different booleans for this.
> >
> >
> > I can see your point. Maybe I can move `s->tbflush_pend`  to 'CPUState'
> instead?
> >
> >
> >> It seems to me that you could create the cpu in a state for which the
> >> first call to
> >> tcg_tb_alloc() sees highwater state, and everything after that
> >> happens per usual allocating a new region, and possibly flushing the full
> buffer.
> >
> >
> > Correct. but with a distinction that highwater state is relevant to a
> > TCGContext and the regions are allocated from a common pool 'Code
> Generation Buffer'.
> > 'code_gen_highwater' is use to detect whether current context needs
> > more region allocation for the dynamic translation to continue. This
> > is a different condition than what we are encountering; which is the
> > worst case condition that the entire code generation buffer is
> > saturated and cannot even allocate a single free TCG region successfully.
> 
> I think you misunderstand "and everything after that happens per usual".
> 
> When allocating a tb, if a cpu finds that it's current region is full, then it tries
> to allocate another region.  If that is not successful, then we flush the entire
> code_gen_buffer and try again.
> 
> Thus tbflush_pend is exactly equivalent to setting
> 
>      s->code_gen_ptr > s->code_gen_highwater.
> 
> As far as lazy_realized...  The utility of the assert under these conditions may
> be called into question; we could just remove it.


I understand your point. I'll remove the 'tbflush_pend' flag  and directly use
'code_gen_highwater = NULL' so that we hit the highwater condition early
when the TCG threads gets lazily realized. And yes, we might have to either
remove or conditionally bypass the assert(). Will dig further and validate. 

Many thanks for this optimization!

Best regards
Salil.


> 
> 
> r~