drivers/hv/channel_mgmt.c | 18 ++++++++++++------ drivers/hv/vmbus_drv.c | 4 ++++ 2 files changed, 16 insertions(+), 6 deletions(-)
Would this have impact for DPDK applications using isolated cpus?
-----Original Message-----
From: Saurabh Sengar <ssengar@linux.microsoft.com>
Sent: Friday, May 27, 2022 12:22 AM
To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; Saurabh Singh Sengar <ssengar@microsoft.com>; Michael Kelley (LINUX) <mikelley@microsoft.com>
Subject: [PATCH v2] Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
When initially assigning a VMbus channel interrupt to a CPU, don’t choose
a managed IRQ isolated CPU (as specified on the kernel boot line with
parameter 'isolcpus=managed_irq,<#cpu>'). Also, when using sysfs to change
the CPU that a VMbus channel will interrupt, don't allow changing to a
managed IRQ isolated CPU.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
v2: * better commit message
* Added back empty line, removed by mistake
* Removed error print for sysfs error
drivers/hv/channel_mgmt.c | 18 ++++++++++++------
drivers/hv/vmbus_drv.c | 4 ++++
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 97d8f56..e1fe029 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -21,6 +21,7 @@
#include <linux/cpu.h>
#include <linux/hyperv.h>
#include <asm/mshyperv.h>
+#include <linux/sched/isolation.h>
#include "hyperv_vmbus.h"
@@ -728,16 +729,20 @@ static void init_vp_index(struct vmbus_channel *channel)
u32 i, ncpu = num_online_cpus();
cpumask_var_t available_mask;
struct cpumask *allocated_mask;
+ const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
u32 target_cpu;
int numa_node;
if (!perf_chn ||
- !alloc_cpumask_var(&available_mask, GFP_KERNEL)) {
+ !alloc_cpumask_var(&available_mask, GFP_KERNEL) ||
+ cpumask_empty(hk_mask)) {
/*
* If the channel is not a performance critical
* channel, bind it to VMBUS_CONNECT_CPU.
* In case alloc_cpumask_var() fails, bind it to
* VMBUS_CONNECT_CPU.
+ * If all the cpus are isolated, bind it to
+ * VMBUS_CONNECT_CPU.
*/
channel->target_cpu = VMBUS_CONNECT_CPU;
if (perf_chn)
@@ -758,17 +763,19 @@ static void init_vp_index(struct vmbus_channel *channel)
}
allocated_mask = &hv_context.hv_numa_map[numa_node];
- if (cpumask_equal(allocated_mask, cpumask_of_node(numa_node))) {
+retry:
+ cpumask_xor(available_mask, allocated_mask, cpumask_of_node(numa_node));
+ cpumask_and(available_mask, available_mask, hk_mask);
+
+ if (cpumask_empty(available_mask)) {
/*
* We have cycled through all the CPUs in the node;
* reset the allocated map.
*/
cpumask_clear(allocated_mask);
+ goto retry;
}
- cpumask_xor(available_mask, allocated_mask,
- cpumask_of_node(numa_node));
-
target_cpu = cpumask_first(available_mask);
cpumask_set_cpu(target_cpu, allocated_mask);
@@ -778,7 +785,6 @@ static void init_vp_index(struct vmbus_channel *channel)
}
channel->target_cpu = target_cpu;
-
free_cpumask_var(available_mask);
}
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 714d549..547ae33 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -21,6 +21,7 @@
#include <linux/kernel_stat.h>
#include <linux/clockchips.h>
#include <linux/cpu.h>
+#include <linux/sched/isolation.h>
#include <linux/sched/task_stack.h>
#include <linux/delay.h>
@@ -1770,6 +1771,9 @@ static ssize_t target_cpu_store(struct vmbus_channel *channel,
if (target_cpu >= nr_cpumask_bits)
return -EINVAL;
+ if (!cpumask_test_cpu(target_cpu, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
+ return -EINVAL;
+
/* No CPUs should come up or down during this. */
cpus_read_lock();
--
1.8.3.1
From: Stephen Hemminger <sthemmin@microsoft.com> Sent: Friday, May 27, 2022 8:41 AM
>
> Would this have impact for DPDK applications using isolated cpus?
I don't have any existing knowledge of DPDK use of isolated CPUs,
so someone with more expertise feel free to correct me.
From what I see in the DPDK documentation (Section 8.3 here:
https://doc.dpdk.org/guides/linux_gsg/enable_func.html), there's
no impact. The example in that documentation does CPU isolation
only for the purpose of scheduling, not for interrupts. The
example kernel command line is:
isolcpus=2,4,6
which defaults to "domain" as the "flag" and is equivalent to:
isolcpus=domain,2,4,6.
VMbus channel interrupts are affected only if "managed_irq" is
specified as the flag per the commit message below.
And FWIW, cpusets provide a better way to doing scheduler
isolation than the isolcpus kernel boot option. Perhaps the
DPDK documentation should be updated. :-)
Michael
>
> -----Original Message-----
> From: Saurabh Sengar <ssengar@linux.microsoft.com>
> Sent: Friday, May 27, 2022 12:22 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org;
> Saurabh Singh Sengar <ssengar@microsoft.com>; Michael Kelley (LINUX)
> <mikelley@microsoft.com>
> Subject: [PATCH v2] Drivers: hv: vmbus: Don't assign VMbus channel interrupts to
> isolated CPUs
>
> When initially assigning a VMbus channel interrupt to a CPU, don’t choose
> a managed IRQ isolated CPU (as specified on the kernel boot line with
> parameter 'isolcpus=managed_irq,<#cpu>'). Also, when using sysfs to change
> the CPU that a VMbus channel will interrupt, don't allow changing to a
> managed IRQ isolated CPU.
>
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> v2: * better commit message
> * Added back empty line, removed by mistake
> * Removed error print for sysfs error
>
> drivers/hv/channel_mgmt.c | 18 ++++++++++++------
> drivers/hv/vmbus_drv.c | 4 ++++
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 97d8f56..e1fe029 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -21,6 +21,7 @@
> #include <linux/cpu.h>
> #include <linux/hyperv.h>
> #include <asm/mshyperv.h>
> +#include <linux/sched/isolation.h>
>
> #include "hyperv_vmbus.h"
>
> @@ -728,16 +729,20 @@ static void init_vp_index(struct vmbus_channel *channel)
> u32 i, ncpu = num_online_cpus();
> cpumask_var_t available_mask;
> struct cpumask *allocated_mask;
> + const struct cpumask *hk_mask =
> housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
> u32 target_cpu;
> int numa_node;
>
> if (!perf_chn ||
> - !alloc_cpumask_var(&available_mask, GFP_KERNEL)) {
> + !alloc_cpumask_var(&available_mask, GFP_KERNEL) ||
> + cpumask_empty(hk_mask)) {
> /*
> * If the channel is not a performance critical
> * channel, bind it to VMBUS_CONNECT_CPU.
> * In case alloc_cpumask_var() fails, bind it to
> * VMBUS_CONNECT_CPU.
> + * If all the cpus are isolated, bind it to
> + * VMBUS_CONNECT_CPU.
> */
> channel->target_cpu = VMBUS_CONNECT_CPU;
> if (perf_chn)
> @@ -758,17 +763,19 @@ static void init_vp_index(struct vmbus_channel *channel)
> }
> allocated_mask = &hv_context.hv_numa_map[numa_node];
>
> - if (cpumask_equal(allocated_mask, cpumask_of_node(numa_node))) {
> +retry:
> + cpumask_xor(available_mask, allocated_mask,
> cpumask_of_node(numa_node));
> + cpumask_and(available_mask, available_mask, hk_mask);
> +
> + if (cpumask_empty(available_mask)) {
> /*
> * We have cycled through all the CPUs in the node;
> * reset the allocated map.
> */
> cpumask_clear(allocated_mask);
> + goto retry;
> }
>
> - cpumask_xor(available_mask, allocated_mask,
> - cpumask_of_node(numa_node));
> -
> target_cpu = cpumask_first(available_mask);
> cpumask_set_cpu(target_cpu, allocated_mask);
>
> @@ -778,7 +785,6 @@ static void init_vp_index(struct vmbus_channel *channel)
> }
>
> channel->target_cpu = target_cpu;
> -
> free_cpumask_var(available_mask);
> }
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 714d549..547ae33 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -21,6 +21,7 @@
> #include <linux/kernel_stat.h>
> #include <linux/clockchips.h>
> #include <linux/cpu.h>
> +#include <linux/sched/isolation.h>
> #include <linux/sched/task_stack.h>
>
> #include <linux/delay.h>
> @@ -1770,6 +1771,9 @@ static ssize_t target_cpu_store(struct vmbus_channel
> *channel,
> if (target_cpu >= nr_cpumask_bits)
> return -EINVAL;
>
> + if (!cpumask_test_cpu(target_cpu,
> housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
> + return -EINVAL;
> +
> /* No CPUs should come up or down during this. */
> cpus_read_lock();
>
> --
> 1.8.3.1
Doing this will actually help DPDK applications on isolated cpus. The history is isolated cpus came first, then cpusets and now the preferred kernel solution is cgroups. For PCI hardware this is handled in userspace typically since it is a policy decision. -----Original Message----- From: Michael Kelley (LINUX) <mikelley@microsoft.com> Sent: Saturday, May 28, 2022 5:56 AM To: Stephen Hemminger <sthemmin@microsoft.com>; Saurabh Sengar <ssengar@linux.microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; Saurabh Singh Sengar <ssengar@microsoft.com> Subject: RE: [PATCH v2] Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs From: Stephen Hemminger <sthemmin@microsoft.com> Sent: Friday, May 27, 2022 8:41 AM > > Would this have impact for DPDK applications using isolated cpus? I don't have any existing knowledge of DPDK use of isolated CPUs, so someone with more expertise feel free to correct me. From what I see in the DPDK documentation (Section 8.3 here: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.dpdk.org%2Fguides%2Flinux_gsg%2Fenable_func.html&data=05%7C01%7Csthemmin%40microsoft.com%7C45f1aefca73845a7470408da40a96d81%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637893393679306569%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vT3keyehM9AWGhPJ9ItWJhhjN%2Bl7ZGB07l1KapOG0I0%3D&reserved=0), there's no impact. The example in that documentation does CPU isolation only for the purpose of scheduling, not for interrupts. The example kernel command line is: isolcpus=2,4,6 which defaults to "domain" as the "flag" and is equivalent to: isolcpus=domain,2,4,6. VMbus channel interrupts are affected only if "managed_irq" is specified as the flag per the commit message below. And FWIW, cpusets provide a better way to doing scheduler isolation than the isolcpus kernel boot option. Perhaps the DPDK documentation should be updated. :-) Michael
© 2016 - 2026 Red Hat, Inc.