From nobody Mon Dec 1 22:05:13 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 395F01DE2AD for ; Fri, 28 Nov 2025 21:21:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764364898; cv=none; b=OfE9kcMq+trcoIugvLpuVlFHPLC1CLhfub/jmz9VxPnDjifzcdOh+ELowv7czknv1/g2k5TY166VqkOjj0eqo5dBNQkxFld5Tn9fc7WYdtBHH1McBHXe4GZ57+NFh3IhRQ/wh88+LGq6Q4IO6wk4omxMtqDbajEfvjkB38dUlZs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764364898; c=relaxed/simple; bh=Ix8e7RIvG5dyEGxvCLs/UxyDfwwP7MRtFGVo76Mddw4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CbD2ha2J7P/qGIRHepXMiqlCvM4o0jSKjkDaRSmQJz8Ope+awijC/XAmg4Z5C2uql9+c54FmT6h740tSex/ae6TKbtpbpMut0z2lU+V9s7nFDkPB3IYZbHNat3NF8Hs6cF8Jhw8WgZIVh9BUMUTTn2isD4izfBZKJLTLhKx0UCU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Ssb6yvQa; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ssb6yvQa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764364895; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7a6dTOggsVB+4H8Va/Ejmg/gb93aLByLh6aPEnqbYC4=; b=Ssb6yvQaBilWws/0+qJtXh51aOZGZU9qh3MsvkleDJ3bU1D6ByiuJ8W4Yehie3rxCpICOT tm3pc16kGjsruPmqQqvGBe9+Mv77n5rP2phn55E7v28eSOHC+GaWCKTVS0g5VrmDHb4hCE uUT5++wZ7iHMnv8VzqMRPcboBH9qjF0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-665-s1eFjfH7PSi00WGkgGT6mA-1; Fri, 28 Nov 2025 16:21:31 -0500 X-MC-Unique: s1eFjfH7PSi00WGkgGT6mA-1 X-Mimecast-MFC-AGG-ID: s1eFjfH7PSi00WGkgGT6mA_1764364890 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1E9CA19560B2; Fri, 28 Nov 2025 21:21:30 +0000 (UTC) Received: from thinkpad-p1.kanata.rendec.net (unknown [10.22.88.129]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2854218004A3; Fri, 28 Nov 2025 21:21:26 +0000 (UTC) From: Radu Rendec To: Thomas Gleixner , Manivannan Sadhasivam Cc: Daniel Tsai , =?UTF-8?q?Marek=20Beh=C3=BAn?= , Krishna Chaitanya Chundru , Bjorn Helgaas , Rob Herring , =?UTF-8?q?Krzysztof=20Wilczy=C5=84ski?= , Lorenzo Pieralisi , Jingoo Han , Brian Masney , Eric Chanudet , Alessandro Carminati , Jared Kangas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 1/3] genirq: Add interrupt redirection infrastructure Date: Fri, 28 Nov 2025 16:20:53 -0500 Message-ID: <20251128212055.1409093-2-rrendec@redhat.com> In-Reply-To: <20251128212055.1409093-1-rrendec@redhat.com> References: <20251128212055.1409093-1-rrendec@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Add infrastructure to redirect interrupt handler execution to a different CPU when the current CPU is not part of the interrupt's CPU affinity mask. This is primarily aimed at (de)multiplexed interrupts, where the child interrupt handler runs in the context of the parent interrupt handler, and therefore CPU affinity control for the child interrupt is typically not available. With the new infrastructure, the child interrupt is allowed to freely change its affinity setting, independently of the parent. If the interrupt handler happens to be triggered on an "incompatible" CPU (a CPU that's not part of the child interrupt's affinity mask), the handler is redirected and runs in IRQ work context on a "compatible" CPU. No functional change is being made to any existing irqchip driver, and irqchip drivers must be explicitly modified to use the newly added infrastructure to support interrupt redirection. Originally-by: Thomas Gleixner Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/ Signed-off-by: Radu Rendec --- include/linux/irq.h | 10 +++++ include/linux/irqdesc.h | 17 +++++++- kernel/irq/chip.c | 22 ++++++++++- kernel/irq/irqdesc.c | 86 ++++++++++++++++++++++++++++++++++++++++- kernel/irq/manage.c | 15 ++++++- 5 files changed, 144 insertions(+), 6 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index c67e76fbcc077..b6966747d88ca 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -459,6 +459,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_= data *d) * checks against the supplied affinity mask are not * required. This is used for CPU hotplug where the * target CPU is not yet set in the cpu_online_mask. + * @irq_pre_redirect: Optional function to be invoked before redirecting + * an interrupt via irq_work. Called only on CONFIG_SMP. * @irq_retrigger: resend an IRQ to the CPU * @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ * @irq_set_wake: enable/disable power-management wake-on of an IRQ @@ -503,6 +505,7 @@ struct irq_chip { void (*irq_eoi)(struct irq_data *data); =20 int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *des= t, bool force); + void (*irq_pre_redirect)(struct irq_data *data); int (*irq_retrigger)(struct irq_data *data); int (*irq_set_type)(struct irq_data *data, unsigned int flow_type); int (*irq_set_wake)(struct irq_data *data, unsigned int on); @@ -688,6 +691,13 @@ extern int irq_chip_set_vcpu_affinity_parent(struct ir= q_data *data, extern int irq_chip_set_type_parent(struct irq_data *data, unsigned int ty= pe); extern int irq_chip_request_resources_parent(struct irq_data *data); extern void irq_chip_release_resources_parent(struct irq_data *data); +#ifdef CONFIG_SMP +void irq_chip_pre_redirect_parent(struct irq_data *data); +#endif +#endif + +#ifdef CONFIG_SMP +int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpu= mask *dest, bool force); #endif =20 /* Disable or mask interrupts during a kernel kexec */ diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index fd091c35d5721..620ddd3951751 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -2,9 +2,10 @@ #ifndef _LINUX_IRQDESC_H #define _LINUX_IRQDESC_H =20 -#include +#include #include #include +#include =20 /* * Core internal functions to deal with irq descriptors @@ -29,6 +30,17 @@ struct irqstat { #endif }; =20 +/** + * struct irq_redirect - interrupt redirection metadata + * @work: HARD work item for handler execution on a different cpu + * @target_cpu: cpu to run irq handler on in case the current cpu is not p= art + * of the irq affinity mask + */ +struct irq_redirect { + struct irq_work work; + unsigned int target_cpu; +}; + /** * struct irq_desc - interrupt descriptor * @irq_common_data: per irq and chip data passed down to chip functions @@ -46,6 +58,7 @@ struct irqstat { * @threads_handled: stats field for deferred spurious detection of thread= ed handlers * @threads_handled_last: comparator field for deferred spurious detection= of threaded handlers * @lock: locking for SMP + * @redirect: Facility for redirecting interrupts via irq_work * @affinity_hint: hint to user space for preferred irq affinity * @affinity_notify: context for notification of affinity changes * @pending_mask: pending rebalanced interrupts @@ -84,6 +97,7 @@ struct irq_desc { struct cpumask *percpu_enabled; const struct cpumask *percpu_affinity; #ifdef CONFIG_SMP + struct irq_redirect redirect; const struct cpumask *affinity_hint; struct irq_affinity_notify *affinity_notify; #ifdef CONFIG_GENERIC_PENDING_IRQ @@ -186,6 +200,7 @@ int generic_handle_irq_safe(unsigned int irq); int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwir= q); int generic_handle_domain_irq_safe(struct irq_domain *domain, unsigned int= hwirq); int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwir= q); +bool generic_handle_demux_domain_irq(struct irq_domain *domain, unsigned i= nt hwirq); #endif =20 /* Test to see if a driver has successfully requested an irq */ diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index d1917b28761a3..d5c3f6ee24cc2 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1143,7 +1143,7 @@ void irq_cpu_offline(void) } #endif =20 -#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY =20 #ifdef CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS /** @@ -1215,6 +1215,15 @@ EXPORT_SYMBOL_GPL(handle_fasteoi_mask_irq); =20 #endif /* CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS */ =20 +#ifdef CONFIG_SMP +void irq_chip_pre_redirect_parent(struct irq_data *data) +{ + data =3D data->parent_data; + data->chip->irq_pre_redirect(data); +} +EXPORT_SYMBOL_GPL(irq_chip_pre_redirect_parent); +#endif + /** * irq_chip_set_parent_state - set the state of a parent interrupt. * @@ -1497,6 +1506,17 @@ void irq_chip_release_resources_parent(struct irq_da= ta *data) data->chip->irq_release_resources(data); } EXPORT_SYMBOL_GPL(irq_chip_release_resources_parent); +#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */ + +#ifdef CONFIG_SMP +int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpu= mask *dest, bool force) +{ + struct irq_redirect *redir =3D &irq_data_to_desc(data)->redirect; + + WRITE_ONCE(redir->target_cpu, cpumask_first(dest)); + return IRQ_SET_MASK_OK; +} +EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity); #endif =20 /** diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index db714d3014b5f..d3d4e7cf12937 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -78,8 +78,12 @@ static int alloc_masks(struct irq_desc *desc, int node) return 0; } =20 -static void desc_smp_init(struct irq_desc *desc, int node, - const struct cpumask *affinity) +static void irq_redirect_work(struct irq_work *work) +{ + handle_irq_desc(container_of(work, struct irq_desc, redirect.work)); +} + +static void desc_smp_init(struct irq_desc *desc, int node, const struct cp= umask *affinity) { if (!affinity) affinity =3D irq_default_affinity; @@ -91,6 +95,7 @@ static void desc_smp_init(struct irq_desc *desc, int node, #ifdef CONFIG_NUMA desc->irq_common_data.node =3D node; #endif + desc->redirect.work =3D IRQ_WORK_INIT_HARD(irq_redirect_work); } =20 static void free_masks(struct irq_desc *desc) @@ -766,6 +771,83 @@ int generic_handle_domain_nmi(struct irq_domain *domai= n, unsigned int hwirq) WARN_ON_ONCE(!in_nmi()); return handle_irq_desc(irq_resolve_mapping(domain, hwirq)); } + +#ifdef CONFIG_SMP +static bool demux_redirect_remote(struct irq_desc *desc) +{ + guard(raw_spinlock)(&desc->lock); + const struct cpumask *m =3D irq_data_get_effective_affinity_mask(&desc->i= rq_data); + unsigned int target_cpu =3D READ_ONCE(desc->redirect.target_cpu); + + if (desc->irq_data.chip->irq_pre_redirect) + desc->irq_data.chip->irq_pre_redirect(&desc->irq_data); + + /* + * If the interrupt handler is already running on a CPU that's included + * in the interrupt's affinity mask, redirection is not necessary. + */ + if (cpumask_test_cpu(smp_processor_id(), m)) + return false; + + /* + * The desc->action check protects against IRQ shutdown: __free_irq() sets + * desc->action to NULL while holding desc->lock, which we also hold. + * + * Calling irq_work_queue_on() here is safe w.r.t. CPU unplugging: + * - takedown_cpu() schedules multi_cpu_stop() on all active CPUs, + * including the one that's taken down. + * - multi_cpu_stop() acts like a barrier, which means all active + * CPUs go through MULTI_STOP_DISABLE_IRQ and disable hard IRQs + * *before* the dying CPU runs take_cpu_down() in MULTI_STOP_RUN. + * - Hard IRQs are re-enabled at the end of multi_cpu_stop(), *after* + * the dying CPU has run take_cpu_down() in MULTI_STOP_RUN. + * - Since we run in hard IRQ context, we run either before or after + * take_cpu_down() but never concurrently. + * - If we run before take_cpu_down(), the dying CPU hasn't been marked + * offline yet (it's marked via take_cpu_down() -> __cpu_disable()), + * so the WARN in irq_work_queue_on() can't occur. + * - Furthermore, the work item we queue will be flushed later via + * take_cpu_down() -> cpuhp_invoke_callback_range_nofail() -> + * smpcfd_dying_cpu() -> irq_work_run(). + * - If we run after take_cpu_down(), target_cpu has been already + * updated via take_cpu_down() -> __cpu_disable(), which eventually + * calls irq_do_set_affinity() during IRQ migration. So, target_cpu + * no longer points to the dying CPU in this case. + */ + if (desc->action) + irq_work_queue_on(&desc->redirect.work, target_cpu); + + return true; +} +#else /* CONFIG_SMP */ +static bool demux_redirect_remote(struct irq_desc *desc) +{ + return false; +} +#endif + +/** + * generic_handle_demux_domain_irq - Invoke the handler for a hardware int= errupt + * of a demultiplexing domain. + * @domain: The domain where to perform the lookup + * @hwirq: The hardware interrupt number to convert to a logical one + * + * Returns: True on success, or false if lookup has failed + */ +bool generic_handle_demux_domain_irq(struct irq_domain *domain, unsigned i= nt hwirq) +{ + struct irq_desc *desc =3D irq_resolve_mapping(domain, hwirq); + + if (unlikely(!desc)) + return false; + + if (demux_redirect_remote(desc)) + return true; + + return !handle_irq_desc(desc); +} +EXPORT_SYMBOL_GPL(generic_handle_demux_domain_irq); + #endif =20 /* Dynamic interrupt handling */ diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 400856abf6721..0c06f37d8a203 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -35,6 +35,16 @@ static int __init setup_forced_irqthreads(char *arg) early_param("threadirqs", setup_forced_irqthreads); #endif =20 +#ifdef CONFIG_SMP +static inline void synchronize_irqwork(struct irq_desc *desc) +{ + /* Synchronize pending or on the fly redirect work */ + irq_work_sync(&desc->redirect.work); +} +#else +static inline void synchronize_irqwork(struct irq_desc *desc) { } +#endif + static int __irq_get_irqchip_state(struct irq_data *d, enum irqchip_irq_st= ate which, bool *state); =20 static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip) @@ -107,7 +117,9 @@ EXPORT_SYMBOL(synchronize_hardirq); =20 static void __synchronize_irq(struct irq_desc *desc) { + synchronize_irqwork(desc); __synchronize_hardirq(desc, true); + /* * We made sure that no hardirq handler is running. Now verify that no * threaded handlers are active. @@ -217,8 +229,7 @@ static inline void irq_validate_effective_affinity(stru= ct irq_data *data) { } =20 static DEFINE_PER_CPU(struct cpumask, __tmp_mask); =20 -int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, - bool force) +int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,= bool force) { struct cpumask *tmp_mask =3D this_cpu_ptr(&__tmp_mask); struct irq_desc *desc =3D irq_data_to_desc(data); --=20 2.51.1 From nobody Mon Dec 1 22:05:13 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA4782D97BD for ; Fri, 28 Nov 2025 21:21:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764364902; cv=none; b=ToUh86pHQjACPSHgpBOWPQma4+WYRnciNLnZTeUyMAgA9wcKgjS9ze4qIWW9UpVwcks5CSZf38ss7m88m4matuwvptchRc9gOenyV9nwVotZVG7m655UlNR4/poBnGvdPmqyBc6QIO2nh06Cdz2Gko7FO5RLJ6B/BrxrzH0m/6A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764364902; c=relaxed/simple; bh=OcNj3KhlSMXnisRnxfn8vmBqdGw9onFSEFaxw4x+Ui0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zicl4t+jk8Lm96efBdS0movhDsZ2KekfJPBhOY/q41J+e+AWfEEzFGX9DZi3sIAECOotsCVvwbMfGdQiAiADScC4d1/i7nmpaSvq28reBT634gE5s7AYJmp8mUAHSPNa0ElaP/l6UbcMIPJySnlFU8QlIH+AmSYazxFYS2UBGpw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=H/jTkA3B; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H/jTkA3B" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764364899; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=svWelt3m+eNHY+vJ903YqnBRp0e0Eoiy2TlhY1ElaKk=; b=H/jTkA3BbCLVklhNpUU8/r8U5G5HGB3ZYXwvaJ9AD/kuemwJjGPqc6GlB/JQjFoIpcV9Hm SI++3W/q0CK5CSAtBA3Cu3+IAMbZXzUuXcacddkbdaElEiXdwTko7I45MDbQa/9al8/J/9 N/TGJksu0lUl5YF7m79y936P5eIdG/c= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-498-Uh_DHybsOEe3Vc9TdObB0A-1; Fri, 28 Nov 2025 16:21:36 -0500 X-MC-Unique: Uh_DHybsOEe3Vc9TdObB0A-1 X-Mimecast-MFC-AGG-ID: Uh_DHybsOEe3Vc9TdObB0A_1764364894 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 96DA0180049F; Fri, 28 Nov 2025 21:21:34 +0000 (UTC) Received: from thinkpad-p1.kanata.rendec.net (unknown [10.22.88.129]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CBFEC1800367; Fri, 28 Nov 2025 21:21:31 +0000 (UTC) From: Radu Rendec To: Thomas Gleixner , Manivannan Sadhasivam Cc: Daniel Tsai , =?UTF-8?q?Marek=20Beh=C3=BAn?= , Krishna Chaitanya Chundru , Bjorn Helgaas , Rob Herring , =?UTF-8?q?Krzysztof=20Wilczy=C5=84ski?= , Lorenzo Pieralisi , Jingoo Han , Brian Masney , Eric Chanudet , Alessandro Carminati , Jared Kangas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/3] PCI: dwc: Code cleanup Date: Fri, 28 Nov 2025 16:20:54 -0500 Message-ID: <20251128212055.1409093-3-rrendec@redhat.com> In-Reply-To: <20251128212055.1409093-1-rrendec@redhat.com> References: <20251128212055.1409093-1-rrendec@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Code cleanup with no functional changes. These changes were originally made by Thomas Gleixner (see Link tag below) in a patch that was never submitted as is. Other parts of that patch were eventually submitted as commit 8e717112caf3 ("PCI: dwc: Switch to msi_create_parent_irq_domain()") and the remaining parts are the code cleanup changes in this patch. Summary of changes: - Use guard()/scoped_guard() instead of open-coded lock/unlock. - Return void in a few functions whose return value is never used. - Simplify dw_handle_msi_irq() by using for_each_set_bit(). One notable deviation from the original patch is that I reverted back to a simple 1 by 1 iteration over the controllers inside dw_handle_msi_irq. The reason is that with the original changes, the IRQ offset was calculated incorrectly. This patch also prepares the ground for the next patch in the series, which enables MSI affinity support, and was originally part of that same series that Thomas Gleixner prepared. Originally-by: Thomas Gleixner Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/ Signed-off-by: Radu Rendec --- .../pci/controller/dwc/pcie-designware-host.c | 98 ++++++------------- drivers/pci/controller/dwc/pcie-designware.h | 7 +- 2 files changed, 34 insertions(+), 71 deletions(-) diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pc= i/controller/dwc/pcie-designware-host.c index e92513c5bda51..aa93acaa579a5 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -46,35 +46,25 @@ static const struct msi_parent_ops dw_pcie_msi_parent_o= ps =3D { }; =20 /* MSI int handler */ -irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp) +void dw_handle_msi_irq(struct dw_pcie_rp *pp) { - int i, pos; - unsigned long val; - u32 status, num_ctrls; - irqreturn_t ret =3D IRQ_NONE; struct dw_pcie *pci =3D to_dw_pcie_from_pp(pp); + unsigned int i, num_ctrls; =20 num_ctrls =3D pp->num_vectors / MAX_MSI_IRQS_PER_CTRL; =20 for (i =3D 0; i < num_ctrls; i++) { - status =3D dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS + - (i * MSI_REG_CTRL_BLOCK_SIZE)); + unsigned int reg_off =3D i * MSI_REG_CTRL_BLOCK_SIZE; + unsigned int irq_off =3D i * MAX_MSI_IRQS_PER_CTRL; + unsigned long status, pos; + + status =3D dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS + reg_off); if (!status) continue; =20 - ret =3D IRQ_HANDLED; - val =3D status; - pos =3D 0; - while ((pos =3D find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL, - pos)) !=3D MAX_MSI_IRQS_PER_CTRL) { - generic_handle_domain_irq(pp->irq_domain, - (i * MAX_MSI_IRQS_PER_CTRL) + - pos); - pos++; - } + for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL) + generic_handle_domain_irq(pp->irq_domain, irq_off + pos); } - - return ret; } =20 /* Chained MSI interrupt service routine */ @@ -95,13 +85,10 @@ static void dw_pci_setup_msi_msg(struct irq_data *d, st= ruct msi_msg *msg) { struct dw_pcie_rp *pp =3D irq_data_get_irq_chip_data(d); struct dw_pcie *pci =3D to_dw_pcie_from_pp(pp); - u64 msi_target; - - msi_target =3D (u64)pp->msi_data; + u64 msi_target =3D (u64)pp->msi_data; =20 msg->address_lo =3D lower_32_bits(msi_target); msg->address_hi =3D upper_32_bits(msi_target); - msg->data =3D d->hwirq; =20 dev_dbg(pci->dev, "msi#%d address_hi %#x address_lo %#x\n", @@ -113,18 +100,14 @@ static void dw_pci_bottom_mask(struct irq_data *d) struct dw_pcie_rp *pp =3D irq_data_get_irq_chip_data(d); struct dw_pcie *pci =3D to_dw_pcie_from_pp(pp); unsigned int res, bit, ctrl; - unsigned long flags; - - raw_spin_lock_irqsave(&pp->lock, flags); =20 + guard(raw_spinlock)(&pp->lock); ctrl =3D d->hwirq / MAX_MSI_IRQS_PER_CTRL; res =3D ctrl * MSI_REG_CTRL_BLOCK_SIZE; bit =3D d->hwirq % MAX_MSI_IRQS_PER_CTRL; =20 pp->irq_mask[ctrl] |=3D BIT(bit); dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, pp->irq_mask[ctrl]); - - raw_spin_unlock_irqrestore(&pp->lock, flags); } =20 static void dw_pci_bottom_unmask(struct irq_data *d) @@ -132,18 +115,14 @@ static void dw_pci_bottom_unmask(struct irq_data *d) struct dw_pcie_rp *pp =3D irq_data_get_irq_chip_data(d); struct dw_pcie *pci =3D to_dw_pcie_from_pp(pp); unsigned int res, bit, ctrl; - unsigned long flags; - - raw_spin_lock_irqsave(&pp->lock, flags); =20 + guard(raw_spinlock)(&pp->lock); ctrl =3D d->hwirq / MAX_MSI_IRQS_PER_CTRL; res =3D ctrl * MSI_REG_CTRL_BLOCK_SIZE; bit =3D d->hwirq % MAX_MSI_IRQS_PER_CTRL; =20 pp->irq_mask[ctrl] &=3D ~BIT(bit); dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, pp->irq_mask[ctrl]); - - raw_spin_unlock_irqrestore(&pp->lock, flags); } =20 static void dw_pci_bottom_ack(struct irq_data *d) @@ -160,54 +139,42 @@ static void dw_pci_bottom_ack(struct irq_data *d) } =20 static struct irq_chip dw_pci_msi_bottom_irq_chip =3D { - .name =3D "DWPCI-MSI", - .irq_ack =3D dw_pci_bottom_ack, - .irq_compose_msi_msg =3D dw_pci_setup_msi_msg, - .irq_mask =3D dw_pci_bottom_mask, - .irq_unmask =3D dw_pci_bottom_unmask, + .name =3D "DWPCI-MSI", + .irq_ack =3D dw_pci_bottom_ack, + .irq_compose_msi_msg =3D dw_pci_setup_msi_msg, + .irq_mask =3D dw_pci_bottom_mask, + .irq_unmask =3D dw_pci_bottom_unmask, }; =20 -static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, - unsigned int virq, unsigned int nr_irqs, - void *args) +static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned in= t virq, + unsigned int nr_irqs, void *args) { struct dw_pcie_rp *pp =3D domain->host_data; - unsigned long flags; - u32 i; int bit; =20 - raw_spin_lock_irqsave(&pp->lock, flags); - - bit =3D bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors, - order_base_2(nr_irqs)); - - raw_spin_unlock_irqrestore(&pp->lock, flags); + scoped_guard (raw_spinlock_irq, &pp->lock) { + bit =3D bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors, + order_base_2(nr_irqs)); + } =20 if (bit < 0) return -ENOSPC; =20 - for (i =3D 0; i < nr_irqs; i++) - irq_domain_set_info(domain, virq + i, bit + i, - pp->msi_irq_chip, - pp, handle_edge_irq, - NULL, NULL); - + for (unsigned int i =3D 0; i < nr_irqs; i++) { + irq_domain_set_info(domain, virq + i, bit + i, pp->msi_irq_chip, + pp, handle_edge_irq, NULL, NULL); + } return 0; } =20 -static void dw_pcie_irq_domain_free(struct irq_domain *domain, - unsigned int virq, unsigned int nr_irqs) +static void dw_pcie_irq_domain_free(struct irq_domain *domain, unsigned in= t virq, + unsigned int nr_irqs) { struct irq_data *d =3D irq_domain_get_irq_data(domain, virq); struct dw_pcie_rp *pp =3D domain->host_data; - unsigned long flags; - - raw_spin_lock_irqsave(&pp->lock, flags); =20 - bitmap_release_region(pp->msi_irq_in_use, d->hwirq, - order_base_2(nr_irqs)); - - raw_spin_unlock_irqrestore(&pp->lock, flags); + guard(raw_spinlock_irq)(&pp->lock); + bitmap_release_region(pp->msi_irq_in_use, d->hwirq, order_base_2(nr_irqs)= ); } =20 static const struct irq_domain_ops dw_pcie_msi_domain_ops =3D { @@ -240,8 +207,7 @@ void dw_pcie_free_msi(struct dw_pcie_rp *pp) =20 for (ctrl =3D 0; ctrl < MAX_MSI_CTRLS; ctrl++) { if (pp->msi_irq[ctrl] > 0) - irq_set_chained_handler_and_data(pp->msi_irq[ctrl], - NULL, NULL); + irq_set_chained_handler_and_data(pp->msi_irq[ctrl], NULL, NULL); } =20 irq_domain_remove(pp->irq_domain); diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/con= troller/dwc/pcie-designware.h index e995f692a1ecd..ef212a56f60c5 100644 --- a/drivers/pci/controller/dwc/pcie-designware.h +++ b/drivers/pci/controller/dwc/pcie-designware.h @@ -802,7 +802,7 @@ static inline enum dw_pcie_ltssm dw_pcie_get_ltssm(stru= ct dw_pcie *pci) #ifdef CONFIG_PCIE_DW_HOST int dw_pcie_suspend_noirq(struct dw_pcie *pci); int dw_pcie_resume_noirq(struct dw_pcie *pci); -irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp); +void dw_handle_msi_irq(struct dw_pcie_rp *pp); void dw_pcie_msi_init(struct dw_pcie_rp *pp); int dw_pcie_msi_host_init(struct dw_pcie_rp *pp); void dw_pcie_free_msi(struct dw_pcie_rp *pp); @@ -823,10 +823,7 @@ static inline int dw_pcie_resume_noirq(struct dw_pcie = *pci) return 0; } =20 -static inline irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp) -{ - return IRQ_NONE; -} +static inline void dw_handle_msi_irq(struct dw_pcie_rp *pp) { } =20 static inline void dw_pcie_msi_init(struct dw_pcie_rp *pp) { } --=20 2.51.1 From nobody Mon Dec 1 22:05:13 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6DC72E718B for ; Fri, 28 Nov 2025 21:21:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764364907; cv=none; b=pvp4vrrqBBR0wQ30tGfdtMUnkdRvH9UU19fpk4C2Bx/6MXXR34w9Ejn2w0pJPJhxq5tWkKjncNgh4q8w+6hIMdMTOUgX8IpT1C17/OiJHgIdQO0HevstpvHiUYatKIAS8JdE1hnXVNOEHMxpL1LwdeYI7LXQ2UjHIkkK90IL4t4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764364907; c=relaxed/simple; bh=HR0mjuvAFCoPiKIDEaxTEF6mpphlp3ix+5hRURxUeXU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L73tgUNhB+ko10v9eHen7Eb7zTZ5QkjLYVFun1p5aGFuklp8vUYBeS0Dd1kTl4TMym8d2fnyQOOBHenxQkhWOBbHaEb+S6TThIzR1wgU2I6n27SserGU03PD5E5OfqAKQfJEQhK6XQi0A5sxweAwmTfdu3msQwRNRUIYU1MjsGE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WjiQoEVE; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WjiQoEVE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764364904; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HxSNQ6ba9bh+xHYQs2dla19F/krQL1PILIcViaY2SKM=; b=WjiQoEVEgVika4HZCpT52hwYc+NWrSXOmpTAmMjTdCumN1tnDiQyWZfkmbU3s0BUADfQmS Pqmlk49FxjlfqvkakboijbqOHZ0RxOveUt7BWfuXJkLFf5p2RxGCWa60LofQ7eXwW3J6jn /CFhOqOhWCcadhZhGTYFZ/T37tA8ZTs= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-407-B4dGEvseNny7qq-YzyOCgA-1; Fri, 28 Nov 2025 16:21:41 -0500 X-MC-Unique: B4dGEvseNny7qq-YzyOCgA-1 X-Mimecast-MFC-AGG-ID: B4dGEvseNny7qq-YzyOCgA_1764364899 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8BDF71956053; Fri, 28 Nov 2025 21:21:39 +0000 (UTC) Received: from thinkpad-p1.kanata.rendec.net (unknown [10.22.88.129]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BD01F1800876; Fri, 28 Nov 2025 21:21:36 +0000 (UTC) From: Radu Rendec To: Thomas Gleixner , Manivannan Sadhasivam Cc: Daniel Tsai , =?UTF-8?q?Marek=20Beh=C3=BAn?= , Krishna Chaitanya Chundru , Bjorn Helgaas , Rob Herring , =?UTF-8?q?Krzysztof=20Wilczy=C5=84ski?= , Lorenzo Pieralisi , Jingoo Han , Brian Masney , Eric Chanudet , Alessandro Carminati , Jared Kangas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Date: Fri, 28 Nov 2025 16:20:55 -0500 Message-ID: <20251128212055.1409093-4-rrendec@redhat.com> In-Reply-To: <20251128212055.1409093-1-rrendec@redhat.com> References: <20251128212055.1409093-1-rrendec@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Leverage the interrupt redirection infrastructure to enable CPU affinity support for MSI interrupts. Since the parent interrupt affinity cannot be changed, affinity control for the child interrupt (MSI) is achieved by redirecting the handler to run in IRQ work context on the target CPU. This patch was originally prepared by Thomas Gleixner (see Link tag below) in a patch series that was never submitted as is, and only parts of that series have made it upstream so far. Originally-by: Thomas Gleixner Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/ Signed-off-by: Radu Rendec --- .../pci/controller/dwc/pcie-designware-host.c | 33 ++++++++++++++++--- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pc= i/controller/dwc/pcie-designware-host.c index aa93acaa579a5..90d9cb45e7842 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -26,9 +26,27 @@ static struct pci_ops dw_pcie_ops; static struct pci_ops dw_pcie_ecam_ops; static struct pci_ops dw_child_pcie_ops; =20 +#ifdef CONFIG_SMP +static void dw_irq_noop(struct irq_data *d) { } +#endif + +static bool dw_pcie_init_dev_msi_info(struct device *dev, struct irq_domai= n *domain, + struct irq_domain *real_parent, struct msi_domain_info *info) +{ + if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info)) + return false; + +#ifdef CONFIG_SMP + info->chip->irq_ack =3D dw_irq_noop; + info->chip->irq_pre_redirect =3D irq_chip_pre_redirect_parent; +#else + info->chip->irq_ack =3D irq_chip_ack_parent; +#endif + return true; +} + #define DW_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \ MSI_FLAG_USE_DEF_CHIP_OPS | \ - MSI_FLAG_NO_AFFINITY | \ MSI_FLAG_PCI_MSI_MASK_PARENT) #define DW_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI | \ MSI_FLAG_PCI_MSIX | \ @@ -40,9 +58,8 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops= =3D { .required_flags =3D DW_PCIE_MSI_FLAGS_REQUIRED, .supported_flags =3D DW_PCIE_MSI_FLAGS_SUPPORTED, .bus_select_token =3D DOMAIN_BUS_PCI_MSI, - .chip_flags =3D MSI_CHIP_FLAG_SET_ACK, .prefix =3D "DW-", - .init_dev_msi_info =3D msi_lib_init_dev_msi_info, + .init_dev_msi_info =3D dw_pcie_init_dev_msi_info, }; =20 /* MSI int handler */ @@ -63,7 +80,7 @@ void dw_handle_msi_irq(struct dw_pcie_rp *pp) continue; =20 for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL) - generic_handle_domain_irq(pp->irq_domain, irq_off + pos); + generic_handle_demux_domain_irq(pp->irq_domain, irq_off + pos); } } =20 @@ -140,10 +157,16 @@ static void dw_pci_bottom_ack(struct irq_data *d) =20 static struct irq_chip dw_pci_msi_bottom_irq_chip =3D { .name =3D "DWPCI-MSI", - .irq_ack =3D dw_pci_bottom_ack, .irq_compose_msi_msg =3D dw_pci_setup_msi_msg, .irq_mask =3D dw_pci_bottom_mask, .irq_unmask =3D dw_pci_bottom_unmask, +#ifdef CONFIG_SMP + .irq_ack =3D dw_irq_noop, + .irq_pre_redirect =3D dw_pci_bottom_ack, + .irq_set_affinity =3D irq_chip_redirect_set_affinity, +#else + .irq_ack =3D dw_pci_bottom_ack, +#endif }; =20 static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned in= t virq, --=20 2.51.1