[PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions

Frederic Weisbecker posted 2 patches 1 week, 6 days ago
There is a newer version of this series
[PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Frederic Weisbecker 1 week, 6 days ago
When a cpuset isolated partition is created / updated or destroyed,
the IRQ threads are affine blindly to all the non-isolated CPUs. And
this happens without taking into account the IRQ thread initial
affinity that becomes ignored.

For example in a system with 8 CPUs, if an IRQ and its kthread are
initially affine to CPU 5, creating an isolated partition with only
CPU 2 inside will eventually end up affining the IRQ kthread to all
CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
CPU 5.

Besides the blind re-affinity, this doesn't take care of the actual
low level interrupt which isn't migrated. As of today the only way to
isolate non managed interrupts, along with their kthreads, is to
overwrite their affinity separately, for example through /proc/irq/

To avoid doing that manually, future development should focus on
updating the IRQs affinity whenever cpuset isolated partitions are
updated.

In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
To prevent from that, set the PF_NO_SETAFFINITY flag to them.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/irq/manage.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 400856abf672..76e2cbe21d1f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -176,7 +176,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
 }
 
 /**
- * irq_set_thread_affinity - Notify irq threads to adjust affinity
+ * irq_thread_update_affinity - Notify irq threads to adjust affinity
  * @desc:	irq descriptor which has affinity changed
  *
  * Just set IRQTF_AFFINITY and delegate the affinity setting to the
@@ -184,7 +184,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
  * we hold desc->lock and this code can be called from hard interrupt
  * context.
  */
-static void irq_set_thread_affinity(struct irq_desc *desc)
+static void irq_thread_update_affinity(struct irq_desc *desc)
 {
 	struct irqaction *action;
 
@@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 		fallthrough;
 	case IRQ_SET_MASK_OK_NOCOPY:
 		irq_validate_effective_affinity(data);
-		irq_set_thread_affinity(desc);
+		irq_thread_update_affinity(desc);
 		ret = 0;
 	}
 
@@ -1035,8 +1035,16 @@ static void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *a
 		set_cpus_allowed_ptr(current, mask);
 	free_cpumask_var(mask);
 }
+
+static inline void irq_thread_set_affinity(struct task_struct *t,
+					   struct irq_desc *desc)
+{
+	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
+}
 #else
 static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
+static inline void irq_thread_set_affinity(struct task_struct *t,
+					   struct irq_desc *desc) { }
 #endif
 
 static int irq_wait_for_interrupt(struct irq_desc *desc,
@@ -1221,6 +1229,7 @@ static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
 	if (!action || !action->thread)
 		return;
 
+	irq_thread_set_affinity(action->thread, desc);
 	wake_up_process(action->thread);
 	wait_event(desc->wait_for_threads,
 		   test_bit(IRQTF_READY, &action->thread_flags));
@@ -1405,16 +1414,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
 	 * references an already freed task_struct.
 	 */
 	new->thread = get_task_struct(t);
-	/*
-	 * Tell the thread to set its affinity. This is
-	 * important for shared interrupt handlers as we do
-	 * not invoke setup_affinity() for the secondary
-	 * handlers as everything is already set up. Even for
-	 * interrupts marked with IRQF_NO_BALANCE this is
-	 * correct as we want the thread to move to the cpu(s)
-	 * on which the requesting code placed the interrupt.
-	 */
-	set_bit(IRQTF_AFFINITY, &new->thread_flags);
+
 	return 0;
 }
 
-- 
2.51.0
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Marek Szyprowski 1 week, 4 days ago
On 18.11.2025 15:30, Frederic Weisbecker wrote:
> When a cpuset isolated partition is created / updated or destroyed,
> the IRQ threads are affine blindly to all the non-isolated CPUs. And
> this happens without taking into account the IRQ thread initial
> affinity that becomes ignored.
>
> For example in a system with 8 CPUs, if an IRQ and its kthread are
> initially affine to CPU 5, creating an isolated partition with only
> CPU 2 inside will eventually end up affining the IRQ kthread to all
> CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
> CPU 5.
>
> Besides the blind re-affinity, this doesn't take care of the actual
> low level interrupt which isn't migrated. As of today the only way to
> isolate non managed interrupts, along with their kthreads, is to
> overwrite their affinity separately, for example through /proc/irq/
>
> To avoid doing that manually, future development should focus on
> updating the IRQs affinity whenever cpuset isolated partitions are
> updated.
>
> In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
> To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

This patch landed in today's linux-next as commit 844dcacab287 ("genirq: 
Fix interrupt threads affinity vs. cpuset isolated partitions"). In my 
tests I found that it triggers a warnings on some of my test systems. 
This is example of such warning:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/kthread.c:599 kthread_bind_mask+0x2c/0x84
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
6.18.0-rc1-00031-g844dcacab287 #16177 PREEMPT
Hardware name: Samsung Exynos (Flattened Device Tree)
Call trace:
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x88
  dump_stack_lvl from __warn+0x80/0x1d0
  __warn from warn_slowpath_fmt+0x1b0/0x1bc
  warn_slowpath_fmt from kthread_bind_mask+0x2c/0x84
  kthread_bind_mask from wake_up_and_wait_for_irq_thread_ready+0x3c/0xd4
  wake_up_and_wait_for_irq_thread_ready from __setup_irq+0x3e8/0x894
  __setup_irq from request_threaded_irq+0xe4/0x15c
  request_threaded_irq from devm_request_threaded_irq+0x78/0x104
  devm_request_threaded_irq from max8997_irq_init+0x15c/0x24c
  max8997_irq_init from max8997_i2c_probe+0x118/0x208
  max8997_i2c_probe from i2c_device_probe+0x1bc/0x358
  i2c_device_probe from really_probe+0xe0/0x3d8
  really_probe from __driver_probe_device+0x9c/0x1e0
  __driver_probe_device from driver_probe_device+0x30/0xc0
  driver_probe_device from __device_attach_driver+0xa8/0x120
  __device_attach_driver from bus_for_each_drv+0x84/0xdc
  bus_for_each_drv from __device_attach+0xb0/0x20c
  __device_attach from bus_probe_device+0x8c/0x90
  bus_probe_device from device_add+0x5b0/0x7f0
  device_add from i2c_new_client_device+0x170/0x360
  i2c_new_client_device from of_i2c_register_device+0x80/0xc8
  of_i2c_register_device from of_i2c_register_devices+0x84/0xf8
  of_i2c_register_devices from i2c_register_adapter+0x240/0x7b0
  i2c_register_adapter from s3c24xx_i2c_probe+0x2a0/0x570
  s3c24xx_i2c_probe from platform_probe+0x5c/0x98
  platform_probe from really_probe+0xe0/0x3d8
  really_probe from __driver_probe_device+0x9c/0x1e0
  __driver_probe_device from driver_probe_device+0x30/0xc0
  driver_probe_device from __driver_attach+0x124/0x1d4
  __driver_attach from bus_for_each_dev+0x70/0xc4
  bus_for_each_dev from bus_add_driver+0xe0/0x220
  bus_add_driver from driver_register+0x7c/0x118
  driver_register from do_one_initcall+0x70/0x328
  do_one_initcall from kernel_init_freeable+0x1c0/0x234
  kernel_init_freeable from kernel_init+0x1c/0x12c
  kernel_init from ret_from_fork+0x14/0x28
Exception stack(0xf082dfb0 to 0xf082dff8)
...
irq event stamp: 78529
hardirqs last  enabled at (78805): [<c01bc4f8>] __up_console_sem+0x50/0x60
hardirqs last disabled at (78816): [<c01bc4e4>] __up_console_sem+0x3c/0x60
softirqs last  enabled at (78784): [<c013bb14>] handle_softirqs+0x328/0x520
softirqs last disabled at (78779): [<c013beb8>] __irq_exit_rcu+0x144/0x1f0
---[ end trace 0000000000000000 ]---

Reverting $subject on top of linux-next fixes this issue. Let me know 
how I can help debugging it.


> ---
>   kernel/irq/manage.c | 26 +++++++++++++-------------
>   1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 400856abf672..76e2cbe21d1f 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -176,7 +176,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
>   }
>   
>   /**
> - * irq_set_thread_affinity - Notify irq threads to adjust affinity
> + * irq_thread_update_affinity - Notify irq threads to adjust affinity
>    * @desc:	irq descriptor which has affinity changed
>    *
>    * Just set IRQTF_AFFINITY and delegate the affinity setting to the
> @@ -184,7 +184,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
>    * we hold desc->lock and this code can be called from hard interrupt
>    * context.
>    */
> -static void irq_set_thread_affinity(struct irq_desc *desc)
> +static void irq_thread_update_affinity(struct irq_desc *desc)
>   {
>   	struct irqaction *action;
>   
> @@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>   		fallthrough;
>   	case IRQ_SET_MASK_OK_NOCOPY:
>   		irq_validate_effective_affinity(data);
> -		irq_set_thread_affinity(desc);
> +		irq_thread_update_affinity(desc);
>   		ret = 0;
>   	}
>   
> @@ -1035,8 +1035,16 @@ static void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *a
>   		set_cpus_allowed_ptr(current, mask);
>   	free_cpumask_var(mask);
>   }
> +
> +static inline void irq_thread_set_affinity(struct task_struct *t,
> +					   struct irq_desc *desc)
> +{
> +	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
> +}
>   #else
>   static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> +static inline void irq_thread_set_affinity(struct task_struct *t,
> +					   struct irq_desc *desc) { }
>   #endif
>   
>   static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1221,6 +1229,7 @@ static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
>   	if (!action || !action->thread)
>   		return;
>   
> +	irq_thread_set_affinity(action->thread, desc);
>   	wake_up_process(action->thread);
>   	wait_event(desc->wait_for_threads,
>   		   test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1405,16 +1414,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>   	 * references an already freed task_struct.
>   	 */
>   	new->thread = get_task_struct(t);
> -	/*
> -	 * Tell the thread to set its affinity. This is
> -	 * important for shared interrupt handlers as we do
> -	 * not invoke setup_affinity() for the secondary
> -	 * handlers as everything is already set up. Even for
> -	 * interrupts marked with IRQF_NO_BALANCE this is
> -	 * correct as we want the thread to move to the cpu(s)
> -	 * on which the requesting code placed the interrupt.
> -	 */
> -	set_bit(IRQTF_AFFINITY, &new->thread_flags);
> +
>   	return 0;
>   }
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Thomas Gleixner 1 week, 4 days ago
On Thu, Nov 20 2025 at 12:51, Marek Szyprowski wrote:
> On 18.11.2025 15:30, Frederic Weisbecker wrote:
>> In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
>> To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>>
>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>
> This patch landed in today's linux-next as commit 844dcacab287 ("genirq: 
> Fix interrupt threads affinity vs. cpuset isolated partitions"). In my 
> tests I found that it triggers a warnings on some of my test systems. 
> This is example of such warning:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at kernel/kthread.c:599 kthread_bind_mask+0x2c/0x84
> Modules linked in:
> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
> 6.18.0-rc1-00031-g844dcacab287 #16177 PREEMPT
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Call trace:
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x68/0x88
>   dump_stack_lvl from __warn+0x80/0x1d0
>   __warn from warn_slowpath_fmt+0x1b0/0x1bc
>   warn_slowpath_fmt from kthread_bind_mask+0x2c/0x84
>   kthread_bind_mask from wake_up_and_wait_for_irq_thread_ready+0x3c/0xd4
>   wake_up_and_wait_for_irq_thread_ready from __setup_irq+0x3e8/0x894

Hmm. The only explaination for that is that the thread got woken up
already and left the initial UNINTERRUPTIBLE state and is now waiting
for an interrupt wakeup with INTERRUPTIBLE state.

To validate that theory, can you please apply the patch below? The extra
warning I added should trigger first.

Let me think about a proper cure...

Thanks,

        tglx
---
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -615,6 +615,8 @@ static void __kthread_bind(struct task_s
 void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
 {
 	struct kthread *kthread = to_kthread(p);
+
+	WARN_ON_ONCE(kthread->started);
 	__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
 	WARN_ON_ONCE(kthread->started);
 }
@@ -631,6 +633,8 @@ void kthread_bind_mask(struct task_struc
 void kthread_bind(struct task_struct *p, unsigned int cpu)
 {
 	struct kthread *kthread = to_kthread(p);
+
+	WARN_ON_ONCE(kthread->started);
 	__kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE);
 	WARN_ON_ONCE(kthread->started);
 }
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Frederic Weisbecker 1 week, 4 days ago
Le Thu, Nov 20, 2025 at 12:51:31PM +0100, Marek Szyprowski a écrit :
> On 18.11.2025 15:30, Frederic Weisbecker wrote:
> > When a cpuset isolated partition is created / updated or destroyed,
> > the IRQ threads are affine blindly to all the non-isolated CPUs. And
> > this happens without taking into account the IRQ thread initial
> > affinity that becomes ignored.
> >
> > For example in a system with 8 CPUs, if an IRQ and its kthread are
> > initially affine to CPU 5, creating an isolated partition with only
> > CPU 2 inside will eventually end up affining the IRQ kthread to all
> > CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
> > CPU 5.
> >
> > Besides the blind re-affinity, this doesn't take care of the actual
> > low level interrupt which isn't migrated. As of today the only way to
> > isolate non managed interrupts, along with their kthreads, is to
> > overwrite their affinity separately, for example through /proc/irq/
> >
> > To avoid doing that manually, future development should focus on
> > updating the IRQs affinity whenever cpuset isolated partitions are
> > updated.
> >
> > In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
> > To prevent from that, set the PF_NO_SETAFFINITY flag to them.
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> 
> This patch landed in today's linux-next as commit 844dcacab287 ("genirq: 
> Fix interrupt threads affinity vs. cpuset isolated partitions"). In my 
> tests I found that it triggers a warnings on some of my test systems. 
> This is example of such warning:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at kernel/kthread.c:599 kthread_bind_mask+0x2c/0x84

Erm, does this means that the IRQ thread got awaken before the first official
wakeup in wake_up_and_wait_for_irq_thread_ready()? This looks wrong...

irq_startup() may be called on a few occcasions before. So perhaps
the IRQ already fired and woke up the kthread once before the "official"
first wake up?

There seem to be some initialization ordering issue here...

Thomas?

Thanks.
-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Thomas Gleixner 1 week, 4 days ago
On Thu, Nov 20 2025 at 14:20, Frederic Weisbecker wrote:
> Le Thu, Nov 20, 2025 at 12:51:31PM +0100, Marek Szyprowski a écrit :
>> On 18.11.2025 15:30, Frederic Weisbecker wrote:
>> > When a cpuset isolated partition is created / updated or destroyed,
>> > the IRQ threads are affine blindly to all the non-isolated CPUs. And
>> > this happens without taking into account the IRQ thread initial
>> > affinity that becomes ignored.
>> >
>> > For example in a system with 8 CPUs, if an IRQ and its kthread are
>> > initially affine to CPU 5, creating an isolated partition with only
>> > CPU 2 inside will eventually end up affining the IRQ kthread to all
>> > CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
>> > CPU 5.
>> >
>> > Besides the blind re-affinity, this doesn't take care of the actual
>> > low level interrupt which isn't migrated. As of today the only way to
>> > isolate non managed interrupts, along with their kthreads, is to
>> > overwrite their affinity separately, for example through /proc/irq/
>> >
>> > To avoid doing that manually, future development should focus on
>> > updating the IRQs affinity whenever cpuset isolated partitions are
>> > updated.
>> >
>> > In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
>> > To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>> >
>> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>> 
>> This patch landed in today's linux-next as commit 844dcacab287 ("genirq: 
>> Fix interrupt threads affinity vs. cpuset isolated partitions"). In my 
>> tests I found that it triggers a warnings on some of my test systems. 
>> This is example of such warning:
>> 
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 1 at kernel/kthread.c:599 kthread_bind_mask+0x2c/0x84
>
> Erm, does this means that the IRQ thread got awaken before the first official
> wakeup in wake_up_and_wait_for_irq_thread_ready()? This looks wrong...
>
> irq_startup() may be called on a few occcasions before. So perhaps
> the IRQ already fired and woke up the kthread once before the "official"
> first wake up?
>
> There seem to be some initialization ordering issue here...

That's unavoidable because of locking and hen and egg ordering problems.

When the thread is created the interrupt is not yet started up and
therefore effective affinity is not known. So doing a bind there is
pointless.

The action has to be made visible _before_ starting it up as once the
interrupt is unmasked it might fire. That's even more true for shared
interrupts.

The wakeup/bind/... muck cannot be invoked with the descriptor lock
held, so it has to be done _after_ the thing went live.

So yes an interrupt which fires before kthread_bind() is invoked might
exactly have that effect. It wakes it from the kthread() wait and it
goes straight through to the thread function.

That's why the original code just set the affinity bit and let the
thread itself handle it.

There are three options to solve that:

      1) Bind the thread right after kthread_create() to a random
         housekeeping task and let move itself over to the real place
         once it came up (at this point the affinity is established)

      2) Serialize the setup so that the thread (even, when woken up
         early) get's stuck in UNINTERRUPTIBLE state _before_ it can
         reach wait_for_interrupt() which waits with INTERRUPTIBLE

      3) Teach kthread_create() that the thread is subject to being
         bound to something later on and implement that serialization
         there.

#1 is pretty straight forward. See untested below.

#2 is ugly (I tried)

#3 could be useful in general, but that needs more thoughts

Thanks

        tglx
---
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -133,7 +133,15 @@ void __irq_wake_thread(struct irq_desc *
 	 */
 	atomic_inc(&desc->threads_active);
 
-	wake_up_process(action->thread);
+	/*
+	 * This might be a premature wakeup before the thread reached the
+	 * thread function and set the IRQTF_READY bit. It's waiting in
+	 * kthread code with state UNINTERRUPTIBLE. Once it reaches the
+	 * thread function it waits with INTERRUPTIBLE. The wakeup is not
+	 * lost in that case because the thread is guaranteed to observe
+	 * the RUN flag before it goes to sleep in wait_for_interrupt().
+	 */
+	wake_up_state(action->thread, TASK_INTERRUPTIBLE);
 }
 
 static DEFINE_STATIC_KEY_FALSE(irqhandler_duration_check_enabled);
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1026,16 +1026,8 @@ static void irq_thread_check_affinity(st
 	set_cpus_allowed_ptr(current, mask);
 	free_cpumask_var(mask);
 }
-
-static inline void irq_thread_set_affinity(struct task_struct *t,
-					   struct irq_desc *desc)
-{
-	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
-}
 #else
 static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
-static inline void irq_thread_set_affinity(struct task_struct *t,
-					   struct irq_desc *desc) { }
 #endif
 
 static int irq_wait_for_interrupt(struct irq_desc *desc,
@@ -1220,7 +1212,6 @@ static void wake_up_and_wait_for_irq_thr
 	if (!action || !action->thread)
 		return;
 
-	irq_thread_set_affinity(action->thread, desc);
 	wake_up_process(action->thread);
 	wait_event(desc->wait_for_threads,
 		   test_bit(IRQTF_READY, &action->thread_flags));
@@ -1389,26 +1380,35 @@ static void irq_nmi_teardown(struct irq_
 static int
 setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
 {
+	/*
+	 * At this point interrupt affinity is not known. just assume that
+	 * the current CPU is not isolated and valid to bring the thread
+	 * up. The thread will move itself over to the right place once the
+	 * whole setup is complete.
+	 */
+	unsigned int cpu = raw_smp_processor_id();
 	struct task_struct *t;
 
-	if (!secondary) {
-		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
-				   new->name);
-	} else {
-		t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
-				   new->name);
-	}
+	if (!secondary)
+		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-%s", irq, new->name);
+	else
+		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-s-%s", irq, new->name);
 
 	if (IS_ERR(t))
 		return PTR_ERR(t);
 
 	/*
-	 * We keep the reference to the task struct even if
-	 * the thread dies to avoid that the interrupt code
-	 * references an already freed task_struct.
+	 * We keep the reference to the task struct even if the thread dies
+	 * to avoid that the interrupt code references an already freed
+	 * task_struct.
 	 */
 	new->thread = get_task_struct(t);
 
+	/*
+	 * Ensure the thread adjusts the affinity once it reaches the
+	 * thread function.
+	 */
+	new->thread_flags = BIT(IRQTF_AFFINITY);
 	return 0;
 }
 
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Marek Szyprowski 1 week, 4 days ago
On 20.11.2025 16:00, Thomas Gleixner wrote:
> On Thu, Nov 20 2025 at 14:20, Frederic Weisbecker wrote:
>> Le Thu, Nov 20, 2025 at 12:51:31PM +0100, Marek Szyprowski a écrit :
>>> On 18.11.2025 15:30, Frederic Weisbecker wrote:
>>>> When a cpuset isolated partition is created / updated or destroyed,
>>>> the IRQ threads are affine blindly to all the non-isolated CPUs. And
>>>> this happens without taking into account the IRQ thread initial
>>>> affinity that becomes ignored.
>>>>
>>>> For example in a system with 8 CPUs, if an IRQ and its kthread are
>>>> initially affine to CPU 5, creating an isolated partition with only
>>>> CPU 2 inside will eventually end up affining the IRQ kthread to all
>>>> CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
>>>> CPU 5.
>>>>
>>>> Besides the blind re-affinity, this doesn't take care of the actual
>>>> low level interrupt which isn't migrated. As of today the only way to
>>>> isolate non managed interrupts, along with their kthreads, is to
>>>> overwrite their affinity separately, for example through /proc/irq/
>>>>
>>>> To avoid doing that manually, future development should focus on
>>>> updating the IRQs affinity whenever cpuset isolated partitions are
>>>> updated.
>>>>
>>>> In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
>>>> To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>>>>
>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>> This patch landed in today's linux-next as commit 844dcacab287 ("genirq:
>>> Fix interrupt threads affinity vs. cpuset isolated partitions"). In my
>>> tests I found that it triggers a warnings on some of my test systems.
>>> This is example of such warning:
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 1 at kernel/kthread.c:599 kthread_bind_mask+0x2c/0x84
>> Erm, does this means that the IRQ thread got awaken before the first official
>> wakeup in wake_up_and_wait_for_irq_thread_ready()? This looks wrong...
>>
>> irq_startup() may be called on a few occcasions before. So perhaps
>> the IRQ already fired and woke up the kthread once before the "official"
>> first wake up?
>>
>> There seem to be some initialization ordering issue here...
> That's unavoidable because of locking and hen and egg ordering problems.
>
> When the thread is created the interrupt is not yet started up and
> therefore effective affinity is not known. So doing a bind there is
> pointless.
>
> The action has to be made visible _before_ starting it up as once the
> interrupt is unmasked it might fire. That's even more true for shared
> interrupts.
>
> The wakeup/bind/... muck cannot be invoked with the descriptor lock
> held, so it has to be done _after_ the thing went live.
>
> So yes an interrupt which fires before kthread_bind() is invoked might
> exactly have that effect. It wakes it from the kthread() wait and it
> goes straight through to the thread function.
>
> That's why the original code just set the affinity bit and let the
> thread itself handle it.
>
> There are three options to solve that:
>
>        1) Bind the thread right after kthread_create() to a random
>           housekeeping task and let move itself over to the real place
>           once it came up (at this point the affinity is established)
>
>        2) Serialize the setup so that the thread (even, when woken up
>           early) get's stuck in UNINTERRUPTIBLE state _before_ it can
>           reach wait_for_interrupt() which waits with INTERRUPTIBLE
>
>        3) Teach kthread_create() that the thread is subject to being
>           bound to something later on and implement that serialization
>           there.
>
> #1 is pretty straight forward. See untested below.

This fixes the observed issue. The only problem with that patch is 
incorrect set of arguments to kthread_create_on_cpu() (it doesn't 
support printf-style args).

If that matters:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>


> #2 is ugly (I tried)
>
> #3 could be useful in general, but that needs more thoughts
>
> Thanks
>
>          tglx
> ---
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -133,7 +133,15 @@ void __irq_wake_thread(struct irq_desc *
>   	 */
>   	atomic_inc(&desc->threads_active);
>   
> -	wake_up_process(action->thread);
> +	/*
> +	 * This might be a premature wakeup before the thread reached the
> +	 * thread function and set the IRQTF_READY bit. It's waiting in
> +	 * kthread code with state UNINTERRUPTIBLE. Once it reaches the
> +	 * thread function it waits with INTERRUPTIBLE. The wakeup is not
> +	 * lost in that case because the thread is guaranteed to observe
> +	 * the RUN flag before it goes to sleep in wait_for_interrupt().
> +	 */
> +	wake_up_state(action->thread, TASK_INTERRUPTIBLE);
>   }
>   
>   static DEFINE_STATIC_KEY_FALSE(irqhandler_duration_check_enabled);
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1026,16 +1026,8 @@ static void irq_thread_check_affinity(st
>   	set_cpus_allowed_ptr(current, mask);
>   	free_cpumask_var(mask);
>   }
> -
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc)
> -{
> -	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
> -}
>   #else
>   static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc) { }
>   #endif
>   
>   static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1220,7 +1212,6 @@ static void wake_up_and_wait_for_irq_thr
>   	if (!action || !action->thread)
>   		return;
>   
> -	irq_thread_set_affinity(action->thread, desc);
>   	wake_up_process(action->thread);
>   	wait_event(desc->wait_for_threads,
>   		   test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1389,26 +1380,35 @@ static void irq_nmi_teardown(struct irq_
>   static int
>   setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>   {
> +	/*
> +	 * At this point interrupt affinity is not known. just assume that
> +	 * the current CPU is not isolated and valid to bring the thread
> +	 * up. The thread will move itself over to the right place once the
> +	 * whole setup is complete.
> +	 */
> +	unsigned int cpu = raw_smp_processor_id();
>   	struct task_struct *t;
>   
> -	if (!secondary) {
> -		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
> -				   new->name);
> -	} else {
> -		t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
> -				   new->name);
> -	}
> +	if (!secondary)
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-%s", irq, new->name);
> +	else
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-s-%s", irq, new->name);
>   
>   	if (IS_ERR(t))
>   		return PTR_ERR(t);
>   
>   	/*
> -	 * We keep the reference to the task struct even if
> -	 * the thread dies to avoid that the interrupt code
> -	 * references an already freed task_struct.
> +	 * We keep the reference to the task struct even if the thread dies
> +	 * to avoid that the interrupt code references an already freed
> +	 * task_struct.
>   	 */
>   	new->thread = get_task_struct(t);
>   
> +	/*
> +	 * Ensure the thread adjusts the affinity once it reaches the
> +	 * thread function.
> +	 */
> +	new->thread_flags = BIT(IRQTF_AFFINITY);
>   	return 0;
>   }
>   
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Frederic Weisbecker 1 week, 4 days ago
Le Thu, Nov 20, 2025 at 04:00:39PM +0100, Thomas Gleixner a écrit :
> On Thu, Nov 20 2025 at 14:20, Frederic Weisbecker wrote:
> > Erm, does this means that the IRQ thread got awaken before the first official
> > wakeup in wake_up_and_wait_for_irq_thread_ready()? This looks wrong...
> >
> > irq_startup() may be called on a few occcasions before. So perhaps
> > the IRQ already fired and woke up the kthread once before the "official"
> > first wake up?
> >
> > There seem to be some initialization ordering issue here...
> 
> That's unavoidable because of locking and hen and egg ordering problems.
> 
> When the thread is created the interrupt is not yet started up and
> therefore effective affinity is not known. So doing a bind there is
> pointless.
> 
> The action has to be made visible _before_ starting it up as once the
> interrupt is unmasked it might fire. That's even more true for shared
> interrupts.
> 
> The wakeup/bind/... muck cannot be invoked with the descriptor lock
> held, so it has to be done _after_ the thing went live.
> 
> So yes an interrupt which fires before kthread_bind() is invoked might
> exactly have that effect. It wakes it from the kthread() wait and it
> goes straight through to the thread function.
> 
> That's why the original code just set the affinity bit and let the
> thread itself handle it.
> 
> There are three options to solve that:
> 
>       1) Bind the thread right after kthread_create() to a random
>          housekeeping task and let move itself over to the real place
>          once it came up (at this point the affinity is established)
> 
>       2) Serialize the setup so that the thread (even, when woken up
>          early) get's stuck in UNINTERRUPTIBLE state _before_ it can
>          reach wait_for_interrupt() which waits with INTERRUPTIBLE
> 
>       3) Teach kthread_create() that the thread is subject to being
>          bound to something later on and implement that serialization
>          there.
> 
> #1 is pretty straight forward. See untested below.

Right.

> 
> #2 is ugly (I tried)

Ok...

> 
> #3 could be useful in general, but that needs more thoughts

A lot of thoughts yes given the many things to consider (kthread
automatic affinity handling, etc...).

And if we do that we must always affine the IRQ threads remotely
to avoid conflict with them. But that can be a problem with locking,
etc...

> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -133,7 +133,15 @@ void __irq_wake_thread(struct irq_desc *
>  	 */
>  	atomic_inc(&desc->threads_active);
>  
> -	wake_up_process(action->thread);
> +	/*
> +	 * This might be a premature wakeup before the thread reached the
> +	 * thread function and set the IRQTF_READY bit. It's waiting in
> +	 * kthread code with state UNINTERRUPTIBLE. Once it reaches the
> +	 * thread function it waits with INTERRUPTIBLE. The wakeup is not
> +	 * lost in that case because the thread is guaranteed to observe
> +	 * the RUN flag before it goes to sleep in wait_for_interrupt().
> +	 */
> +	wake_up_state(action->thread, TASK_INTERRUPTIBLE);
>  }
>  
>  static DEFINE_STATIC_KEY_FALSE(irqhandler_duration_check_enabled);
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1026,16 +1026,8 @@ static void irq_thread_check_affinity(st
>  	set_cpus_allowed_ptr(current, mask);
>  	free_cpumask_var(mask);
>  }
> -
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc)
> -{
> -	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
> -}
>  #else
>  static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc) { }
>  #endif
>  
>  static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1220,7 +1212,6 @@ static void wake_up_and_wait_for_irq_thr
>  	if (!action || !action->thread)
>  		return;
>  
> -	irq_thread_set_affinity(action->thread, desc);
>  	wake_up_process(action->thread);
>  	wait_event(desc->wait_for_threads,
>  		   test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1389,26 +1380,35 @@ static void irq_nmi_teardown(struct irq_
>  static int
>  setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>  {
> +	/*
> +	 * At this point interrupt affinity is not known. just assume that
> +	 * the current CPU is not isolated and valid to bring the thread
> +	 * up. The thread will move itself over to the right place once the
> +	 * whole setup is complete.
> +	 */
> +	unsigned int cpu = raw_smp_processor_id();
>  	struct task_struct *t;
>  
> -	if (!secondary) {
> -		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
> -				   new->name);
> -	} else {
> -		t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
> -				   new->name);
> -	}
> +	if (!secondary)
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-%s", irq, new->name);
> +	else
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-s-%s",
> -		irq, new->name);

Right I though about something like that, it involved:

kthread_bind_mask(t, cpu_possible_mask);

Which do you prefer? Also do you prefer such a fixup or should I refactor my
patches you merged?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Thomas Gleixner 1 week, 4 days ago
On Thu, Nov 20 2025 at 16:50, Frederic Weisbecker wrote:
> Le Thu, Nov 20, 2025 at 04:00:39PM +0100, Thomas Gleixner a écrit :
>> +	if (!secondary)
>> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-%s", irq, new->name);
>> +	else
>> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-s-%s",
>> -		irq, new->name);
>
> Right I though about something like that, it involved:
>
> kthread_bind_mask(t, cpu_possible_mask);

That's way simpler and also solves the problem with the
kthread_create_on_cpu() name which Marek pointed out.

> Which do you prefer? Also do you prefer such a fixup or should I refactor my
> patches you merged?

Can you split out the wakeup change into a separate patch
(Suggested-by-me) with it's own change log and fold the
kthread_bind_mask() + set(AFFINITY) bit into this one.

I just go and zap the existing commits (they are on top of the branch).

Thanks,

        tglx
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Waiman Long 1 week, 6 days ago
On 11/18/25 9:30 AM, Frederic Weisbecker wrote:
> When a cpuset isolated partition is created / updated or destroyed,
> the IRQ threads are affine blindly to all the non-isolated CPUs. And
> this happens without taking into account the IRQ thread initial
> affinity that becomes ignored.
>
> For example in a system with 8 CPUs, if an IRQ and its kthread are
> initially affine to CPU 5, creating an isolated partition with only
> CPU 2 inside will eventually end up affining the IRQ kthread to all
> CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
> CPU 5.
>
> Besides the blind re-affinity, this doesn't take care of the actual
> low level interrupt which isn't migrated. As of today the only way to
> isolate non managed interrupts, along with their kthreads, is to
> overwrite their affinity separately, for example through /proc/irq/
>
> To avoid doing that manually, future development should focus on
> updating the IRQs affinity whenever cpuset isolated partitions are
> updated.
>
> In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
> To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>   kernel/irq/manage.c | 26 +++++++++++++-------------
>   1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 400856abf672..76e2cbe21d1f 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -176,7 +176,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
>   }
>   
>   /**
> - * irq_set_thread_affinity - Notify irq threads to adjust affinity
> + * irq_thread_update_affinity - Notify irq threads to adjust affinity
>    * @desc:	irq descriptor which has affinity changed
>    *
>    * Just set IRQTF_AFFINITY and delegate the affinity setting to the
> @@ -184,7 +184,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
>    * we hold desc->lock and this code can be called from hard interrupt
>    * context.
>    */
> -static void irq_set_thread_affinity(struct irq_desc *desc)
> +static void irq_thread_update_affinity(struct irq_desc *desc)
>   {
>   	struct irqaction *action;
>   
> @@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>   		fallthrough;
>   	case IRQ_SET_MASK_OK_NOCOPY:
>   		irq_validate_effective_affinity(data);
> -		irq_set_thread_affinity(desc);
> +		irq_thread_update_affinity(desc);
>   		ret = 0;
>   	}
>   
> @@ -1035,8 +1035,16 @@ static void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *a
>   		set_cpus_allowed_ptr(current, mask);
>   	free_cpumask_var(mask);
>   }
> +
> +static inline void irq_thread_set_affinity(struct task_struct *t,
> +					   struct irq_desc *desc)
> +{
> +	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
> +}

According to irq_thread_check_affinity(), accessing the cpumask returned 
from irq_data_get_effective_affinity_mask(&desc->irq_data) requires 
taking desc->lock to ensure its stability. Do we need something similar 
here? Other than that, it looks good to me.

Cheers,
Longman

>   #else
>   static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> +static inline void irq_thread_set_affinity(struct task_struct *t,
> +					   struct irq_desc *desc) { }
>   #endif
>   
>   static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1221,6 +1229,7 @@ static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
>   	if (!action || !action->thread)
>   		return;
>   
> +	irq_thread_set_affinity(action->thread, desc);
>   	wake_up_process(action->thread);
>   	wait_event(desc->wait_for_threads,
>   		   test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1405,16 +1414,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>   	 * references an already freed task_struct.
>   	 */
>   	new->thread = get_task_struct(t);
> -	/*
> -	 * Tell the thread to set its affinity. This is
> -	 * important for shared interrupt handlers as we do
> -	 * not invoke setup_affinity() for the secondary
> -	 * handlers as everything is already set up. Even for
> -	 * interrupts marked with IRQF_NO_BALANCE this is
> -	 * correct as we want the thread to move to the cpu(s)
> -	 * on which the requesting code placed the interrupt.
> -	 */
> -	set_bit(IRQTF_AFFINITY, &new->thread_flags);
> +
>   	return 0;
>   }
>
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Thomas Gleixner 1 week, 6 days ago
On Tue, Nov 18 2025 at 11:27, Waiman Long wrote:
>> +static inline void irq_thread_set_affinity(struct task_struct *t,
>> +					   struct irq_desc *desc)
>> +{
>> +	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
>> +}
>
> According to irq_thread_check_affinity(), accessing the cpumask returned 
> from irq_data_get_effective_affinity_mask(&desc->irq_data) requires 
> taking desc->lock to ensure its stability. Do we need something similar 
> here? Other than that, it looks good to me.

That's during interrupt setup so it should be stable (famous last words)
Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated partitions
Posted by Waiman Long 1 week, 6 days ago
On 11/18/25 12:23 PM, Thomas Gleixner wrote:
> On Tue, Nov 18 2025 at 11:27, Waiman Long wrote:
>>> +static inline void irq_thread_set_affinity(struct task_struct *t,
>>> +					   struct irq_desc *desc)
>>> +{
>>> +	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
>>> +}
>> According to irq_thread_check_affinity(), accessing the cpumask returned
>> from irq_data_get_effective_affinity_mask(&desc->irq_data) requires
>> taking desc->lock to ensure its stability. Do we need something similar
>> here? Other than that, it looks good to me.
> That's during interrupt setup so it should be stable (famous last words)

Thanks for the clarification. We should probably add a comment to 
mention that.

Cheers,
Longman
[tip: irq/core] genirq: Fix interrupt threads affinity vs. cpuset isolated partitions
Posted by tip-bot2 for Frederic Weisbecker 1 week, 6 days ago
The following commit has been merged into the irq/core branch of tip:

Commit-ID:     844dcacab287a46862a244ed413df26ef77400a6
Gitweb:        https://git.kernel.org/tip/844dcacab287a46862a244ed413df26ef77400a6
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Tue, 18 Nov 2025 15:30:51 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Nov 2025 16:19:40 +01:00

genirq: Fix interrupt threads affinity vs. cpuset isolated partitions

When a cpuset isolated partition is created / updated or destroyed, the
interrupt threads are affine blindly to all the non-isolated CPUs. And this
happens without taking into account the interrupt threads initial affinity
that becomes ignored.

For example in a system with 8 CPUs, if an interrupt and its kthread are
initially affine to CPU 5, creating an isolated partition with only CPU 2
inside will eventually end up affining the interrupt kthread to all CPUs
but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for CPU 5.

Besides the blind re-affinity, this doesn't take care of the actual low
level interrupt which isn't migrated. As of today the only way to isolate
non managed interrupts, along with their kthreads, is to overwrite their
affinity separately, for example through /proc/irq/

To avoid doing that manually, future development should focus on updating
the interrupt's affinity whenever cpuset isolated partitions are updated.

In the meantime, cpuset shouldn't fiddle with interrupt threads directly.
To prevent from that, set the PF_NO_SETAFFINITY flag to them.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251118143052.68778-2-frederic@kernel.org
---
 kernel/irq/manage.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c1ce30c..3c55805 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -176,7 +176,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
 }
 
 /**
- * irq_set_thread_affinity - Notify irq threads to adjust affinity
+ * irq_thread_update_affinity - Notify irq threads to adjust affinity
  * @desc:	irq descriptor which has affinity changed
  *
  * Just set IRQTF_AFFINITY and delegate the affinity setting to the
@@ -184,7 +184,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
  * we hold desc->lock and this code can be called from hard interrupt
  * context.
  */
-static void irq_set_thread_affinity(struct irq_desc *desc)
+static void irq_thread_update_affinity(struct irq_desc *desc)
 {
 	struct irqaction *action;
 
@@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 		fallthrough;
 	case IRQ_SET_MASK_OK_NOCOPY:
 		irq_validate_effective_affinity(data);
-		irq_set_thread_affinity(desc);
+		irq_thread_update_affinity(desc);
 		ret = 0;
 	}
 
@@ -1035,8 +1035,16 @@ static void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *a
 		set_cpus_allowed_ptr(current, mask);
 	free_cpumask_var(mask);
 }
+
+static inline void irq_thread_set_affinity(struct task_struct *t,
+					   struct irq_desc *desc)
+{
+	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
+}
 #else
 static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
+static inline void irq_thread_set_affinity(struct task_struct *t,
+					   struct irq_desc *desc) { }
 #endif
 
 static int irq_wait_for_interrupt(struct irq_desc *desc,
@@ -1221,6 +1229,7 @@ static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
 	if (!action || !action->thread)
 		return;
 
+	irq_thread_set_affinity(action->thread, desc);
 	wake_up_process(action->thread);
 	wait_event(desc->wait_for_threads,
 		   test_bit(IRQTF_READY, &action->thread_flags));
@@ -1408,16 +1417,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
 	 * references an already freed task_struct.
 	 */
 	new->thread = get_task_struct(t);
-	/*
-	 * Tell the thread to set its affinity. This is
-	 * important for shared interrupt handlers as we do
-	 * not invoke setup_affinity() for the secondary
-	 * handlers as everything is already set up. Even for
-	 * interrupts marked with IRQF_NO_BALANCE this is
-	 * correct as we want the thread to move to the cpu(s)
-	 * on which the requesting code placed the interrupt.
-	 */
-	set_bit(IRQTF_AFFINITY, &new->thread_flags);
+
 	return 0;
 }