[PATCH v4] wifi: iwlwifi: pcie: optimize MSI-X interrupt affinity

Adrián García Casado posted 1 patch 2 weeks, 6 days ago
There is a newer version of this series
.../intel/iwlwifi/pcie/gen1_2/trans.c         | 20 ++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
[PATCH v4] wifi: iwlwifi: pcie: optimize MSI-X interrupt affinity
Posted by Adrián García Casado 2 weeks, 6 days ago
Balanced distribution: skip CPU0 for high-rate RSS queues to avoid contention with system housekeeping. Use a stateful last_cpu approach to ensure unique core assignment when skipping CPU0. This avoids mapping multiple queues to the same core.

Signed-off-by: Adrián García Casado <adriangarciacasado42@gmail.com>
---
 .../intel/iwlwifi/pcie/gen1_2/trans.c         | 20 ++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c
index 4560d92d7..7077ec015 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c
@@ -1672,18 +1672,28 @@ static void iwl_pcie_irq_set_affinity(struct iwl_trans *trans,
 				      struct iwl_trans_info *info)
 {
 #if defined(CONFIG_SMP)
-	int iter_rx_q, i, ret, cpu, offset;
+	int iter_rx_q, i, ret, cpu, offset, last_cpu;
 	struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
 
 	i = trans_pcie->shared_vec_mask & IWL_SHARED_IRQ_FIRST_RSS ? 0 : 1;
 	iter_rx_q = info->num_rxqs - 1 + i;
-	offset = 1 + i;
+	last_cpu = -1;
 	for (; i < iter_rx_q ; i++) {
 		/*
-		 * Get the cpu prior to the place to search
-		 * (i.e. return will be > i - 1).
+		 * Balanced distribution: skip CPU0 for high-rate RSS queues
+		 * to avoid contention with system housekeeping.
 		 */
-		cpu = cpumask_next(i - offset, cpu_online_mask);
+		cpu = cpumask_next(last_cpu, cpu_online_mask);
+		if (cpu >= nr_cpu_ids)
+			cpu = cpumask_first(cpu_online_mask);
+
+		if (cpu == 0 && num_online_cpus() > 1) {
+			cpu = cpumask_next(0, cpu_online_mask);
+			if (cpu >= nr_cpu_ids)
+				cpu = cpumask_first(cpu_online_mask);
+		}
+		last_cpu = cpu;
+
 		cpumask_set_cpu(cpu, &trans_pcie->affinity_mask[i]);
 		ret = irq_set_affinity_hint(trans_pcie->msix_entries[i].vector,
 					    &trans_pcie->affinity_mask[i]);
-- 
2.47.3

Re: [PATCH v4] wifi: iwlwifi: pcie: optimize MSI-X interrupt affinity
Posted by kernel test robot 2 weeks, 5 days ago
Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on wireless-next/main]
[also build test WARNING on wireless/main rust/rust-next linus/master v7.0-rc4 next-20260317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Garc-a-Casado/wifi-iwlwifi-pcie-optimize-MSI-X-interrupt-affinity/20260318-081834
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git main
patch link:    https://lore.kernel.org/r/20260317193252.13763-1-adriangarciacasado42%40gmail.com
patch subject: [PATCH v4] wifi: iwlwifi: pcie: optimize MSI-X interrupt affinity
config: x86_64-rhel-9.4-kunit (https://download.01.org/0day-ci/archive/20260318/202603182147.ECKLrJRf-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260318/202603182147.ECKLrJRf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603182147.ECKLrJRf-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c: In function 'iwl_pcie_irq_set_affinity':
>> drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c:1675:37: warning: unused variable 'offset' [-Wunused-variable]
    1675 |         int iter_rx_q, i, ret, cpu, offset, last_cpu;
         |                                     ^~~~~~


vim +/offset +1675 drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c

  1670	
  1671	static void iwl_pcie_irq_set_affinity(struct iwl_trans *trans,
  1672					      struct iwl_trans_info *info)
  1673	{
  1674	#if defined(CONFIG_SMP)
> 1675		int iter_rx_q, i, ret, cpu, offset, last_cpu;
  1676		struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
  1677	
  1678		i = trans_pcie->shared_vec_mask & IWL_SHARED_IRQ_FIRST_RSS ? 0 : 1;
  1679		iter_rx_q = info->num_rxqs - 1 + i;
  1680		last_cpu = -1;
  1681		for (; i < iter_rx_q ; i++) {
  1682			/*
  1683			 * Balanced distribution: skip CPU0 for high-rate RSS queues
  1684			 * to avoid contention with system housekeeping.
  1685			 */
  1686			cpu = cpumask_next(last_cpu, cpu_online_mask);
  1687			if (cpu >= nr_cpu_ids)
  1688				cpu = cpumask_first(cpu_online_mask);
  1689	
  1690			if (cpu == 0 && num_online_cpus() > 1) {
  1691				cpu = cpumask_next(0, cpu_online_mask);
  1692				if (cpu >= nr_cpu_ids)
  1693					cpu = cpumask_first(cpu_online_mask);
  1694			}
  1695			last_cpu = cpu;
  1696	
  1697			cpumask_set_cpu(cpu, &trans_pcie->affinity_mask[i]);
  1698			ret = irq_set_affinity_hint(trans_pcie->msix_entries[i].vector,
  1699						    &trans_pcie->affinity_mask[i]);
  1700			if (ret)
  1701				IWL_ERR(trans_pcie->trans,
  1702					"Failed to set affinity mask for IRQ %d\n",
  1703					trans_pcie->msix_entries[i].vector);
  1704		}
  1705	#endif
  1706	}
  1707	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v4] wifi: iwlwifi: pcie: optimize MSI-X interrupt affinity
Posted by Johannes Berg 2 weeks, 5 days ago
On Tue, 2026-03-17 at 20:32 +0100, Adrián García Casado wrote:
> Balanced distribution: skip CPU0 for high-rate RSS queues to avoid contention with system housekeeping. Use a stateful last_cpu approach to ensure unique core assignment when skipping CPU0. This avoids mapping multiple queues to the same core.

You need to break lines ...

I tend to think you need a better reason to skip CPU0. Last time you
pretended it was actually going to be faster, now you pretend there's
contention, without ever really getting to any proof of that?


Also please read what I said before:

>> this is wrong since you really then should allocate one queue less,
>> rather than mapping two queues to the same core.

johannes