While profiling switch_mm_irqs_off with several workloads,
it appears there are two hot spots that probably don't need
to be there.
The patch placing the mm_cpumask test inside the prev == next
branch behind CONFIG_DEBUG_VM got merged into x86/mm already,
so here are the other two.
The approach used in v2 to ensure the call to flush_mm_tlb_range()
from __text_poke() remains a noop is to clear the CPU from the
mm_cpumask of poke_mm. Fix suggested by Peter Zijlstra.
That way the only thing flush_mm_tlb_range() really ends up
doing is increment the tlb_gen, resulting in future users of
poke_mm flushing the TLB.