[PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue

Waiman Long posted 2 patches 1 week, 2 days ago
There is a newer version of this series
[PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Waiman Long 1 week, 2 days ago
The update_isolation_cpumasks() function can be called either directly
from regular cpuset control file write with cpuset_full_lock() called
or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.

As we are going to enable dynamic update to the nozh_full housekeeping
cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
allowing the CPU hotplug path to call into housekeeping_update() directly
from update_isolation_cpumasks() will likely cause deadlock. So we
have to defer any call to housekeeping_update() after the CPU hotplug
operation has finished. This is now done via the workqueue where
the actual housekeeping_update() call, if needed, will happen after
cpus_write_lock is released.

We can't use the synchronous task_work API as call from CPU hotplug
path happen in the per-cpu kthread of the CPU that is being shut down
or brought up. Because of the asynchronous nature of workqueue, the
HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
"cpuset.cpus.isolated" control file in this case.

Also add a check in test_cpuset_prs.sh and modify some existing
test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
housekeeping cpumask will both be updated.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
 .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7b7d12ab1006..0b0eb1df09d5 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -84,6 +84,9 @@ static cpumask_var_t	isolated_cpus;
  */
 static bool isolated_cpus_updating;
 
+/* Both cpuset_mutex and cpus_read_locked acquired */
+static bool cpuset_locked;
+
 /*
  * A flag to force sched domain rebuild at the end of an operation.
  * It can be set in
@@ -285,10 +288,12 @@ void cpuset_full_lock(void)
 {
 	cpus_read_lock();
 	mutex_lock(&cpuset_mutex);
+	cpuset_locked = true;
 }
 
 void cpuset_full_unlock(void)
 {
+	cpuset_locked = false;
 	mutex_unlock(&cpuset_mutex);
 	cpus_read_unlock();
 }
@@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
 	return false;
 }
 
+static void isolcpus_workfn(struct work_struct *work)
+{
+	cpuset_full_lock();
+	if (isolated_cpus_updating) {
+		WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
+		isolated_cpus_updating = false;
+	}
+	cpuset_full_unlock();
+}
+
 /*
  * update_isolation_cpumasks - Update external isolation related CPU masks
  *
@@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
  */
 static void update_isolation_cpumasks(void)
 {
-	int ret;
+	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
 
 	if (!isolated_cpus_updating)
 		return;
 
-	ret = housekeeping_update(isolated_cpus);
-	WARN_ON_ONCE(ret < 0);
+	/*
+	 * This function can be reached either directly from regular cpuset
+	 * control file write (cpuset_locked) or via hotplug (cpus_write_lock
+	 * && cpuset_mutex held). In the later case, we defer the
+	 * housekeeping_update() call to the system_unbound_wq to avoid the
+	 * possibility of deadlock. This also means that there will be a short
+	 * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
+	 * behind isolated_cpus.
+	 */
+	if (!cpuset_locked) {
+		/*
+		 * We rely on WORK_STRUCT_PENDING_BIT to not requeue a work
+		 * item that is still pending.
+		 */
+		queue_work(system_unbound_wq, &isolcpus_work);
+		return;
+	}
 
+	WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
 	isolated_cpus_updating = false;
 }
 
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 5dff3ad53867..0502b156582b 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -245,8 +245,9 @@ TEST_MATRIX=(
 	"C2-3:P1:S+  C3:P2  .      .     O2=0   O2=1    .      .     0 A1:2|A2:3 A1:P1|A2:P2"
 	"C2-3:P1:S+  C3:P1  .      .     O2=0    .      .      .     0 A1:|A2:3 A1:P1|A2:P1"
 	"C2-3:P1:S+  C3:P1  .      .     O3=0    .      .      .     0 A1:2|A2: A1:P1|A2:P1"
-	"C2-3:P1:S+  C3:P1  .      .    T:O2=0   .      .      .     0 A1:3|A2:3 A1:P1|A2:P-1"
-	"C2-3:P1:S+  C3:P1  .      .      .    T:O3=0   .      .     0 A1:2|A2:2 A1:P1|A2:P-1"
+	"C2-3:P1:S+  C3:P2  .      .    T:O2=0   .      .      .     0 A1:3|A2:3 A1:P1|A2:P-2"
+	"C1-3:P1:S+  C3:P2  .      .      .    T:O3=0   .      .     0 A1:1-2|A2:1-2 A1:P1|A2:P-2 3|"
+	"C1-3:P1:S+  C3:P2  .      .      .    T:O3=0  O3=1    .     0 A1:1-2|A2:3 A1:P1|A2:P2  3"
 	"$SETUP_A123_PARTITIONS    .     O1=0    .      .      .     0 A1:|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
 	"$SETUP_A123_PARTITIONS    .     O2=0    .      .      .     0 A1:1|A2:|A3:3 A1:P1|A2:P1|A3:P1"
 	"$SETUP_A123_PARTITIONS    .     O3=0    .      .      .     0 A1:1|A2:2|A3: A1:P1|A2:P1|A3:P1"
@@ -764,7 +765,7 @@ check_cgroup_states()
 # only CPUs in isolated partitions as well as those that are isolated at
 # boot time.
 #
-# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
+# $1 - expected isolated cpu list(s) <isolcpus1>{|<isolcpus2>}
 # <isolcpus1> - expected sched/domains value
 # <isolcpus2> - cpuset.cpus.isolated value = <isolcpus1> if not defined
 #
@@ -773,6 +774,7 @@ check_isolcpus()
 	EXPECTED_ISOLCPUS=$1
 	ISCPUS=${CGROUP2}/cpuset.cpus.isolated
 	ISOLCPUS=$(cat $ISCPUS)
+	HKICPUS=$(cat /sys/devices/system/cpu/isolated)
 	LASTISOLCPU=
 	SCHED_DOMAINS=/sys/kernel/debug/sched/domains
 	if [[ $EXPECTED_ISOLCPUS = . ]]
@@ -810,6 +812,11 @@ check_isolcpus()
 	ISOLCPUS=
 	EXPECTED_ISOLCPUS=$EXPECTED_SDOMAIN
 
+	#
+	# The inverse of HK_TYPE_DOMAIN cpumask in $HKICPUS should match $ISOLCPUS
+	#
+	[[ "$ISOLCPUS" != "$HKICPUS" ]] && return 1
+
 	#
 	# Use the sched domain in debugfs to check isolated CPUs, if available
 	#
-- 
2.52.0
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Peter Zijlstra 6 days, 5 hours ago
On Fri, Jan 30, 2026 at 10:42:53AM -0500, Waiman Long wrote:

> +/* Both cpuset_mutex and cpus_read_locked acquired */
> +static bool cpuset_locked;
> +
>  /*
>   * A flag to force sched domain rebuild at the end of an operation.
>   * It can be set in
> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>  {
>  	cpus_read_lock();
>  	mutex_lock(&cpuset_mutex);
> +	cpuset_locked = true;
>  }
>  
>  void cpuset_full_unlock(void)
>  {
> +	cpuset_locked = false;
>  	mutex_unlock(&cpuset_mutex);
>  	cpus_read_unlock();
>  }

> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>   */
>  static void update_isolation_cpumasks(void)
>  {
> -	int ret;
> +	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>  
>  	if (!isolated_cpus_updating)
>  		return;
>  
> -	ret = housekeeping_update(isolated_cpus);
> -	WARN_ON_ONCE(ret < 0);
> +	/*
> +	 * This function can be reached either directly from regular cpuset
> +	 * control file write (cpuset_locked) or via hotplug (cpus_write_lock
> +	 * && cpuset_mutex held). In the later case, we defer the
> +	 * housekeeping_update() call to the system_unbound_wq to avoid the
> +	 * possibility of deadlock. This also means that there will be a short
> +	 * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
> +	 * behind isolated_cpus.
> +	 */
> +	if (!cpuset_locked) {

I agree with Chen that this is bloody terrible.

At the very least this should have:

	lockdep_assert_held(&cpuset_mutex);

But ideally you'd do patches against this and tip/locking/core that add
proper __guarded_by() annotations to this.

> +		/*
> +		 * We rely on WORK_STRUCT_PENDING_BIT to not requeue a work
> +		 * item that is still pending.
> +		 */
> +		queue_work(system_unbound_wq, &isolcpus_work);
> +		return;
> +	}
>  
> +	WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>  	isolated_cpus_updating = false;
>  }
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Waiman Long 5 days, 23 hours ago
On 2/2/26 8:05 AM, Peter Zijlstra wrote:
> On Fri, Jan 30, 2026 at 10:42:53AM -0500, Waiman Long wrote:
>
>> +/* Both cpuset_mutex and cpus_read_locked acquired */
>> +static bool cpuset_locked;
>> +
>>   /*
>>    * A flag to force sched domain rebuild at the end of an operation.
>>    * It can be set in
>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>   {
>>   	cpus_read_lock();
>>   	mutex_lock(&cpuset_mutex);
>> +	cpuset_locked = true;
>>   }
>>   
>>   void cpuset_full_unlock(void)
>>   {
>> +	cpuset_locked = false;
>>   	mutex_unlock(&cpuset_mutex);
>>   	cpus_read_unlock();
>>   }
>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>>    */
>>   static void update_isolation_cpumasks(void)
>>   {
>> -	int ret;
>> +	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>   
>>   	if (!isolated_cpus_updating)
>>   		return;
>>   
>> -	ret = housekeeping_update(isolated_cpus);
>> -	WARN_ON_ONCE(ret < 0);
>> +	/*
>> +	 * This function can be reached either directly from regular cpuset
>> +	 * control file write (cpuset_locked) or via hotplug (cpus_write_lock
>> +	 * && cpuset_mutex held). In the later case, we defer the
>> +	 * housekeeping_update() call to the system_unbound_wq to avoid the
>> +	 * possibility of deadlock. This also means that there will be a short
>> +	 * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
>> +	 * behind isolated_cpus.
>> +	 */
>> +	if (!cpuset_locked) {
> I agree with Chen that this is bloody terrible.
>
> At the very least this should have:
>
> 	lockdep_assert_held(&cpuset_mutex);
>
> But ideally you'd do patches against this and tip/locking/core that add
> proper __guarded_by() annotations to this.

Yes, I am going to remove cpuset_locked in the next version. As for 
__guarded_by() annotation, I need to set up a clang environment that I 
can use to test it before I will work on that. I usually just use gcc 
for my compilation need.

Cheers,
Longman
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Peter Zijlstra 5 days, 22 hours ago
On Mon, Feb 02, 2026 at 01:21:43PM -0500, Waiman Long wrote:

> Yes, I am going to remove cpuset_locked in the next version. As for
> __guarded_by() annotation, I need to set up a clang environment that I can
> use to test it before I will work on that. I usually just use gcc for my
> compilation need.

Debian experimental has clang-22, but there is also:

  https://github.com/llvm/llvm-project/releases/tag/llvmorg-22.1.0-rc2

See: Documentation/kbuild/llvm.rst
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Peter Zijlstra 5 days, 22 hours ago
On Mon, Feb 02, 2026 at 09:04:57PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 02, 2026 at 01:21:43PM -0500, Waiman Long wrote:
> 
> > Yes, I am going to remove cpuset_locked in the next version. As for
> > __guarded_by() annotation, I need to set up a clang environment that I can
> > use to test it before I will work on that. I usually just use gcc for my
> > compilation need.
> 
> Debian experimental has clang-22, but there is also:
> 
>   https://github.com/llvm/llvm-project/releases/tag/llvmorg-22.1.0-rc2

Damn, copied wrong link:

  https://www.kernel.org/pub/tools/llvm/files/llvm-22.1.0-rc2-x86_64.tar.xz

> See: Documentation/kbuild/llvm.rst
>
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Waiman Long 5 days, 17 hours ago
On 2/2/26 3:06 PM, Peter Zijlstra wrote:
> On Mon, Feb 02, 2026 at 09:04:57PM +0100, Peter Zijlstra wrote:
>> On Mon, Feb 02, 2026 at 01:21:43PM -0500, Waiman Long wrote:
>>
>>> Yes, I am going to remove cpuset_locked in the next version. As for
>>> __guarded_by() annotation, I need to set up a clang environment that I can
>>> use to test it before I will work on that. I usually just use gcc for my
>>> compilation need.
>> Debian experimental has clang-22, but there is also:
>>
>>    https://github.com/llvm/llvm-project/releases/tag/llvmorg-22.1.0-rc2
> Damn, copied wrong link:
>
>    https://www.kernel.org/pub/tools/llvm/files/llvm-22.1.0-rc2-x86_64.tar.xz

Thanks for the link. Will play around with that.

Cheers,
Longman

>> See: Documentation/kbuild/llvm.rst
>>
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Chen Ridong 1 week, 1 day ago

On 2026/1/30 23:42, Waiman Long wrote:
> The update_isolation_cpumasks() function can be called either directly
> from regular cpuset control file write with cpuset_full_lock() called
> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
> 
> As we are going to enable dynamic update to the nozh_full housekeeping
> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
> allowing the CPU hotplug path to call into housekeeping_update() directly
> from update_isolation_cpumasks() will likely cause deadlock. So we
> have to defer any call to housekeeping_update() after the CPU hotplug
> operation has finished. This is now done via the workqueue where
> the actual housekeeping_update() call, if needed, will happen after
> cpus_write_lock is released.
> 
> We can't use the synchronous task_work API as call from CPU hotplug
> path happen in the per-cpu kthread of the CPU that is being shut down
> or brought up. Because of the asynchronous nature of workqueue, the
> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
> "cpuset.cpus.isolated" control file in this case.
> 
> Also add a check in test_cpuset_prs.sh and modify some existing
> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
> housekeeping cpumask will both be updated.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>  .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>  2 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 7b7d12ab1006..0b0eb1df09d5 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -84,6 +84,9 @@ static cpumask_var_t	isolated_cpus;
>   */
>  static bool isolated_cpus_updating;
>  
> +/* Both cpuset_mutex and cpus_read_locked acquired */
> +static bool cpuset_locked;
> +
>  /*
>   * A flag to force sched domain rebuild at the end of an operation.
>   * It can be set in
> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>  {
>  	cpus_read_lock();
>  	mutex_lock(&cpuset_mutex);
> +	cpuset_locked = true;
>  }
>  
>  void cpuset_full_unlock(void)
>  {
> +	cpuset_locked = false;
>  	mutex_unlock(&cpuset_mutex);
>  	cpus_read_unlock();
>  }
> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>  	return false;
>  }
>  
> +static void isolcpus_workfn(struct work_struct *work)
> +{
> +	cpuset_full_lock();
> +	if (isolated_cpus_updating) {
> +		WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
> +		isolated_cpus_updating = false;
> +	}
> +	cpuset_full_unlock();
> +}
> +
>  /*
>   * update_isolation_cpumasks - Update external isolation related CPU masks
>   *
> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>   */
>  static void update_isolation_cpumasks(void)
>  {
> -	int ret;
> +	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>  
>  	if (!isolated_cpus_updating)
>  		return;
>  
> -	ret = housekeeping_update(isolated_cpus);
> -	WARN_ON_ONCE(ret < 0);
> +	/*
> +	 * This function can be reached either directly from regular cpuset
> +	 * control file write (cpuset_locked) or via hotplug (cpus_write_lock
> +	 * && cpuset_mutex held). In the later case, we defer the
> +	 * housekeeping_update() call to the system_unbound_wq to avoid the
> +	 * possibility of deadlock. This also means that there will be a short
> +	 * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
> +	 * behind isolated_cpus.
> +	 */
> +	if (!cpuset_locked) {

Adding a global variable makes this difficult to handle, especially in
concurrent scenarios, since we could read it outside of a critical region.

I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
instead, which can indicate to the caller it should call without holding the
full lock.

> +		/*
> +		 * We rely on WORK_STRUCT_PENDING_BIT to not requeue a work
> +		 * item that is still pending.
> +		 */
> +		queue_work(system_unbound_wq, &isolcpus_work);
> +		return;
> +	}
>  
> +	WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>  	isolated_cpus_updating = false;
>  }
>  
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index 5dff3ad53867..0502b156582b 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -245,8 +245,9 @@ TEST_MATRIX=(
>  	"C2-3:P1:S+  C3:P2  .      .     O2=0   O2=1    .      .     0 A1:2|A2:3 A1:P1|A2:P2"
>  	"C2-3:P1:S+  C3:P1  .      .     O2=0    .      .      .     0 A1:|A2:3 A1:P1|A2:P1"
>  	"C2-3:P1:S+  C3:P1  .      .     O3=0    .      .      .     0 A1:2|A2: A1:P1|A2:P1"
> -	"C2-3:P1:S+  C3:P1  .      .    T:O2=0   .      .      .     0 A1:3|A2:3 A1:P1|A2:P-1"
> -	"C2-3:P1:S+  C3:P1  .      .      .    T:O3=0   .      .     0 A1:2|A2:2 A1:P1|A2:P-1"
> +	"C2-3:P1:S+  C3:P2  .      .    T:O2=0   .      .      .     0 A1:3|A2:3 A1:P1|A2:P-2"
> +	"C1-3:P1:S+  C3:P2  .      .      .    T:O3=0   .      .     0 A1:1-2|A2:1-2 A1:P1|A2:P-2 3|"
> +	"C1-3:P1:S+  C3:P2  .      .      .    T:O3=0  O3=1    .     0 A1:1-2|A2:3 A1:P1|A2:P2  3"
>  	"$SETUP_A123_PARTITIONS    .     O1=0    .      .      .     0 A1:|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
>  	"$SETUP_A123_PARTITIONS    .     O2=0    .      .      .     0 A1:1|A2:|A3:3 A1:P1|A2:P1|A3:P1"
>  	"$SETUP_A123_PARTITIONS    .     O3=0    .      .      .     0 A1:1|A2:2|A3: A1:P1|A2:P1|A3:P1"
> @@ -764,7 +765,7 @@ check_cgroup_states()
>  # only CPUs in isolated partitions as well as those that are isolated at
>  # boot time.
>  #
> -# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
> +# $1 - expected isolated cpu list(s) <isolcpus1>{|<isolcpus2>}
>  # <isolcpus1> - expected sched/domains value
>  # <isolcpus2> - cpuset.cpus.isolated value = <isolcpus1> if not defined
>  #
> @@ -773,6 +774,7 @@ check_isolcpus()
>  	EXPECTED_ISOLCPUS=$1
>  	ISCPUS=${CGROUP2}/cpuset.cpus.isolated
>  	ISOLCPUS=$(cat $ISCPUS)
> +	HKICPUS=$(cat /sys/devices/system/cpu/isolated)
>  	LASTISOLCPU=
>  	SCHED_DOMAINS=/sys/kernel/debug/sched/domains
>  	if [[ $EXPECTED_ISOLCPUS = . ]]
> @@ -810,6 +812,11 @@ check_isolcpus()
>  	ISOLCPUS=
>  	EXPECTED_ISOLCPUS=$EXPECTED_SDOMAIN
>  
> +	#
> +	# The inverse of HK_TYPE_DOMAIN cpumask in $HKICPUS should match $ISOLCPUS
> +	#
> +	[[ "$ISOLCPUS" != "$HKICPUS" ]] && return 1
> +
>  	#
>  	# Use the sched domain in debugfs to check isolated CPUs, if available
>  	#

-- 
Best regards,
Ridong
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Waiman Long 1 week, 1 day ago
On 1/30/26 7:58 PM, Chen Ridong wrote:
>
> On 2026/1/30 23:42, Waiman Long wrote:
>> The update_isolation_cpumasks() function can be called either directly
>> from regular cpuset control file write with cpuset_full_lock() called
>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
>>
>> As we are going to enable dynamic update to the nozh_full housekeeping
>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>> allowing the CPU hotplug path to call into housekeeping_update() directly
>> from update_isolation_cpumasks() will likely cause deadlock. So we
>> have to defer any call to housekeeping_update() after the CPU hotplug
>> operation has finished. This is now done via the workqueue where
>> the actual housekeeping_update() call, if needed, will happen after
>> cpus_write_lock is released.
>>
>> We can't use the synchronous task_work API as call from CPU hotplug
>> path happen in the per-cpu kthread of the CPU that is being shut down
>> or brought up. Because of the asynchronous nature of workqueue, the
>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>> "cpuset.cpus.isolated" control file in this case.
>>
>> Also add a check in test_cpuset_prs.sh and modify some existing
>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>> housekeeping cpumask will both be updated.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>   .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>   2 files changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 7b7d12ab1006..0b0eb1df09d5 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -84,6 +84,9 @@ static cpumask_var_t	isolated_cpus;
>>    */
>>   static bool isolated_cpus_updating;
>>   
>> +/* Both cpuset_mutex and cpus_read_locked acquired */
>> +static bool cpuset_locked;
>> +
>>   /*
>>    * A flag to force sched domain rebuild at the end of an operation.
>>    * It can be set in
>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>   {
>>   	cpus_read_lock();
>>   	mutex_lock(&cpuset_mutex);
>> +	cpuset_locked = true;
>>   }
>>   
>>   void cpuset_full_unlock(void)
>>   {
>> +	cpuset_locked = false;
>>   	mutex_unlock(&cpuset_mutex);
>>   	cpus_read_unlock();
>>   }
>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>>   	return false;
>>   }
>>   
>> +static void isolcpus_workfn(struct work_struct *work)
>> +{
>> +	cpuset_full_lock();
>> +	if (isolated_cpus_updating) {
>> +		WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>> +		isolated_cpus_updating = false;
>> +	}
>> +	cpuset_full_unlock();
>> +}
>> +
>>   /*
>>    * update_isolation_cpumasks - Update external isolation related CPU masks
>>    *
>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>>    */
>>   static void update_isolation_cpumasks(void)
>>   {
>> -	int ret;
>> +	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>   
>>   	if (!isolated_cpus_updating)
>>   		return;
>>   
>> -	ret = housekeeping_update(isolated_cpus);
>> -	WARN_ON_ONCE(ret < 0);
>> +	/*
>> +	 * This function can be reached either directly from regular cpuset
>> +	 * control file write (cpuset_locked) or via hotplug (cpus_write_lock
>> +	 * && cpuset_mutex held). In the later case, we defer the
>> +	 * housekeeping_update() call to the system_unbound_wq to avoid the
>> +	 * possibility of deadlock. This also means that there will be a short
>> +	 * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
>> +	 * behind isolated_cpus.
>> +	 */
>> +	if (!cpuset_locked) {
> Adding a global variable makes this difficult to handle, especially in
> concurrent scenarios, since we could read it outside of a critical region.
No, cpuset_locked is always read from or written into inside a critical 
section. It is under cpuset_mutex up to this point and then with the 
cpuset_top_mutex with the next patch.
>
> I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
> instead, which can indicate to the caller it should call without holding the
> full lock.

The point of this global variable is to distinguish between calling from 
CPU hotplug and the other regular cpuset code paths. The only difference 
between these two are having cpus_read_lock or cpus_write_lock held. 
That is why I think adding a global variable in cpuset_full_lock() is 
the easy way. Otherwise, we will to add extra argument to some of the 
functions to distinguish these two cases.

Cheers,
Longman
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Chen Ridong 1 week, 1 day ago

On 2026/1/31 9:45, Waiman Long wrote:
> On 1/30/26 7:58 PM, Chen Ridong wrote:
>>
>> On 2026/1/30 23:42, Waiman Long wrote:
>>> The update_isolation_cpumasks() function can be called either directly
>>> from regular cpuset control file write with cpuset_full_lock() called
>>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
>>>
>>> As we are going to enable dynamic update to the nozh_full housekeeping
>>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>>> allowing the CPU hotplug path to call into housekeeping_update() directly
>>> from update_isolation_cpumasks() will likely cause deadlock. So we
>>> have to defer any call to housekeeping_update() after the CPU hotplug
>>> operation has finished. This is now done via the workqueue where
>>> the actual housekeeping_update() call, if needed, will happen after
>>> cpus_write_lock is released.
>>>
>>> We can't use the synchronous task_work API as call from CPU hotplug
>>> path happen in the per-cpu kthread of the CPU that is being shut down
>>> or brought up. Because of the asynchronous nature of workqueue, the
>>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>>> "cpuset.cpus.isolated" control file in this case.
>>>
>>> Also add a check in test_cpuset_prs.sh and modify some existing
>>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>>> housekeeping cpumask will both be updated.
>>>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>>   .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>>   2 files changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 7b7d12ab1006..0b0eb1df09d5 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -84,6 +84,9 @@ static cpumask_var_t    isolated_cpus;
>>>    */
>>>   static bool isolated_cpus_updating;
>>>   +/* Both cpuset_mutex and cpus_read_locked acquired */
>>> +static bool cpuset_locked;
>>> +
>>>   /*
>>>    * A flag to force sched domain rebuild at the end of an operation.
>>>    * It can be set in
>>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>>   {
>>>       cpus_read_lock();
>>>       mutex_lock(&cpuset_mutex);
>>> +    cpuset_locked = true;
>>>   }
>>>     void cpuset_full_unlock(void)
>>>   {
>>> +    cpuset_locked = false;
>>>       mutex_unlock(&cpuset_mutex);
>>>       cpus_read_unlock();
>>>   }
>>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate,
>>> struct cpumask *new_cpus)
>>>       return false;
>>>   }
>>>   +static void isolcpus_workfn(struct work_struct *work)
>>> +{
>>> +    cpuset_full_lock();
>>> +    if (isolated_cpus_updating) {
>>> +        WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>>> +        isolated_cpus_updating = false;
>>> +    }
>>> +    cpuset_full_unlock();
>>> +}
>>> +
>>>   /*
>>>    * update_isolation_cpumasks - Update external isolation related CPU masks
>>>    *
>>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int
>>> prstate, struct cpumask *new_cpus)
>>>    */
>>>   static void update_isolation_cpumasks(void)
>>>   {
>>> -    int ret;
>>> +    static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>>         if (!isolated_cpus_updating)
>>>           return;
>>>   -    ret = housekeeping_update(isolated_cpus);
>>> -    WARN_ON_ONCE(ret < 0);
>>> +    /*
>>> +     * This function can be reached either directly from regular cpuset
>>> +     * control file write (cpuset_locked) or via hotplug (cpus_write_lock
>>> +     * && cpuset_mutex held). In the later case, we defer the
>>> +     * housekeeping_update() call to the system_unbound_wq to avoid the
>>> +     * possibility of deadlock. This also means that there will be a short
>>> +     * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
>>> +     * behind isolated_cpus.
>>> +     */
>>> +    if (!cpuset_locked) {
>> Adding a global variable makes this difficult to handle, especially in
>> concurrent scenarios, since we could read it outside of a critical region.
> No, cpuset_locked is always read from or written into inside a critical section.
> It is under cpuset_mutex up to this point and then with the cpuset_top_mutex
> with the next patch.

This is somewhat confusing. cpuset_locked is only set to true when the "full
lock" has been acquired. If cpuset_locked is false, that should mean we are
outside of any critical region. Conversely, if we are inside a critical region,
cpuset_locked should be true.

The situation is a bit messy, it’s not clearly which lock protects which global
variable.

>>
>> I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
>> instead, which can indicate to the caller it should call without holding the
>> full lock.
> 
> The point of this global variable is to distinguish between calling from CPU
> hotplug and the other regular cpuset code paths. The only difference between
> these two are having cpus_read_lock or cpus_write_lock held. That is why I think
> adding a global variable in cpuset_full_lock() is the easy way. Otherwise, we
> will to add extra argument to some of the functions to distinguish these two cases.
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong

Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Waiman Long 1 week ago
On 1/30/26 9:05 PM, Chen Ridong wrote:
>
> On 2026/1/31 9:45, Waiman Long wrote:
>> On 1/30/26 7:58 PM, Chen Ridong wrote:
>>> On 2026/1/30 23:42, Waiman Long wrote:
>>>> The update_isolation_cpumasks() function can be called either directly
>>>> from regular cpuset control file write with cpuset_full_lock() called
>>>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
>>>>
>>>> As we are going to enable dynamic update to the nozh_full housekeeping
>>>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>>>> allowing the CPU hotplug path to call into housekeeping_update() directly
>>>> from update_isolation_cpumasks() will likely cause deadlock. So we
>>>> have to defer any call to housekeeping_update() after the CPU hotplug
>>>> operation has finished. This is now done via the workqueue where
>>>> the actual housekeeping_update() call, if needed, will happen after
>>>> cpus_write_lock is released.
>>>>
>>>> We can't use the synchronous task_work API as call from CPU hotplug
>>>> path happen in the per-cpu kthread of the CPU that is being shut down
>>>> or brought up. Because of the asynchronous nature of workqueue, the
>>>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>>>> "cpuset.cpus.isolated" control file in this case.
>>>>
>>>> Also add a check in test_cpuset_prs.sh and modify some existing
>>>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>>>> housekeeping cpumask will both be updated.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>    kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>>>    .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>>>    2 files changed, 44 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index 7b7d12ab1006..0b0eb1df09d5 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -84,6 +84,9 @@ static cpumask_var_t    isolated_cpus;
>>>>     */
>>>>    static bool isolated_cpus_updating;
>>>>    +/* Both cpuset_mutex and cpus_read_locked acquired */
>>>> +static bool cpuset_locked;
>>>> +
>>>>    /*
>>>>     * A flag to force sched domain rebuild at the end of an operation.
>>>>     * It can be set in
>>>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>>>    {
>>>>        cpus_read_lock();
>>>>        mutex_lock(&cpuset_mutex);
>>>> +    cpuset_locked = true;
>>>>    }
>>>>      void cpuset_full_unlock(void)
>>>>    {
>>>> +    cpuset_locked = false;
>>>>        mutex_unlock(&cpuset_mutex);
>>>>        cpus_read_unlock();
>>>>    }
>>>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate,
>>>> struct cpumask *new_cpus)
>>>>        return false;
>>>>    }
>>>>    +static void isolcpus_workfn(struct work_struct *work)
>>>> +{
>>>> +    cpuset_full_lock();
>>>> +    if (isolated_cpus_updating) {
>>>> +        WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>>>> +        isolated_cpus_updating = false;
>>>> +    }
>>>> +    cpuset_full_unlock();
>>>> +}
>>>> +
>>>>    /*
>>>>     * update_isolation_cpumasks - Update external isolation related CPU masks
>>>>     *
>>>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int
>>>> prstate, struct cpumask *new_cpus)
>>>>     */
>>>>    static void update_isolation_cpumasks(void)
>>>>    {
>>>> -    int ret;
>>>> +    static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>>>          if (!isolated_cpus_updating)
>>>>            return;
>>>>    -    ret = housekeeping_update(isolated_cpus);
>>>> -    WARN_ON_ONCE(ret < 0);
>>>> +    /*
>>>> +     * This function can be reached either directly from regular cpuset
>>>> +     * control file write (cpuset_locked) or via hotplug (cpus_write_lock
>>>> +     * && cpuset_mutex held). In the later case, we defer the
>>>> +     * housekeeping_update() call to the system_unbound_wq to avoid the
>>>> +     * possibility of deadlock. This also means that there will be a short
>>>> +     * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
>>>> +     * behind isolated_cpus.
>>>> +     */
>>>> +    if (!cpuset_locked) {
>>> Adding a global variable makes this difficult to handle, especially in
>>> concurrent scenarios, since we could read it outside of a critical region.
>> No, cpuset_locked is always read from or written into inside a critical section.
>> It is under cpuset_mutex up to this point and then with the cpuset_top_mutex
>> with the next patch.
> This is somewhat confusing. cpuset_locked is only set to true when the "full
> lock" has been acquired. If cpuset_locked is false, that should mean we are
> outside of any critical region. Conversely, if we are inside a critical region,
> cpuset_locked should be true.
>
> The situation is a bit messy, it’s not clearly which lock protects which global
> variable.

There is a comment above "cpuset_locked" which state which lock protect 
it. The locking situation is becoming more complicated. I think I will 
add a new patch to more clearly document what each global variable is 
being protected by.

Cheers,
Longman

>
>>> I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
>>> instead, which can indicate to the caller it should call without holding the
>>> full lock.
>> The point of this global variable is to distinguish between calling from CPU
>> hotplug and the other regular cpuset code paths. The only difference between
>> these two are having cpus_read_lock or cpus_write_lock held. That is why I think
>> adding a global variable in cpuset_full_lock() is the easy way. Otherwise, we
>> will to add extra argument to some of the functions to distinguish these two cases.
>>
>> Cheers,
>> Longman
>>

Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Chen Ridong 6 days, 17 hours ago

On 2026/2/1 7:00, Waiman Long wrote:
> On 1/30/26 9:05 PM, Chen Ridong wrote:
>>
>> On 2026/1/31 9:45, Waiman Long wrote:
>>> On 1/30/26 7:58 PM, Chen Ridong wrote:
>>>> On 2026/1/30 23:42, Waiman Long wrote:
>>>>> The update_isolation_cpumasks() function can be called either directly
>>>>> from regular cpuset control file write with cpuset_full_lock() called
>>>>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
>>>>>
>>>>> As we are going to enable dynamic update to the nozh_full housekeeping
>>>>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>>>>> allowing the CPU hotplug path to call into housekeeping_update() directly
>>>>> from update_isolation_cpumasks() will likely cause deadlock. So we
>>>>> have to defer any call to housekeeping_update() after the CPU hotplug
>>>>> operation has finished. This is now done via the workqueue where
>>>>> the actual housekeeping_update() call, if needed, will happen after
>>>>> cpus_write_lock is released.
>>>>>
>>>>> We can't use the synchronous task_work API as call from CPU hotplug
>>>>> path happen in the per-cpu kthread of the CPU that is being shut down
>>>>> or brought up. Because of the asynchronous nature of workqueue, the
>>>>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>>>>> "cpuset.cpus.isolated" control file in this case.
>>>>>
>>>>> Also add a check in test_cpuset_prs.sh and modify some existing
>>>>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>>>>> housekeeping cpumask will both be updated.
>>>>>
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>> ---
>>>>>    kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>>>>    .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>>>>    2 files changed, 44 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>> index 7b7d12ab1006..0b0eb1df09d5 100644
>>>>> --- a/kernel/cgroup/cpuset.c
>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>> @@ -84,6 +84,9 @@ static cpumask_var_t    isolated_cpus;
>>>>>     */
>>>>>    static bool isolated_cpus_updating;
>>>>>    +/* Both cpuset_mutex and cpus_read_locked acquired */
>>>>> +static bool cpuset_locked;
>>>>> +
>>>>>    /*
>>>>>     * A flag to force sched domain rebuild at the end of an operation.
>>>>>     * It can be set in
>>>>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>>>>    {
>>>>>        cpus_read_lock();
>>>>>        mutex_lock(&cpuset_mutex);
>>>>> +    cpuset_locked = true;
>>>>>    }
>>>>>      void cpuset_full_unlock(void)
>>>>>    {
>>>>> +    cpuset_locked = false;
>>>>>        mutex_unlock(&cpuset_mutex);
>>>>>        cpus_read_unlock();
>>>>>    }
>>>>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate,
>>>>> struct cpumask *new_cpus)
>>>>>        return false;
>>>>>    }
>>>>>    +static void isolcpus_workfn(struct work_struct *work)
>>>>> +{
>>>>> +    cpuset_full_lock();
>>>>> +    if (isolated_cpus_updating) {
>>>>> +        WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>>>>> +        isolated_cpus_updating = false;
>>>>> +    }
>>>>> +    cpuset_full_unlock();
>>>>> +}
>>>>> +
>>>>>    /*
>>>>>     * update_isolation_cpumasks - Update external isolation related CPU masks
>>>>>     *
>>>>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int
>>>>> prstate, struct cpumask *new_cpus)
>>>>>     */
>>>>>    static void update_isolation_cpumasks(void)
>>>>>    {
>>>>> -    int ret;
>>>>> +    static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>>>>          if (!isolated_cpus_updating)
>>>>>            return;
>>>>>    -    ret = housekeeping_update(isolated_cpus);
>>>>> -    WARN_ON_ONCE(ret < 0);
>>>>> +    /*
>>>>> +     * This function can be reached either directly from regular cpuset
>>>>> +     * control file write (cpuset_locked) or via hotplug (cpus_write_lock
>>>>> +     * && cpuset_mutex held). In the later case, we defer the
>>>>> +     * housekeeping_update() call to the system_unbound_wq to avoid the
>>>>> +     * possibility of deadlock. This also means that there will be a short
>>>>> +     * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
>>>>> +     * behind isolated_cpus.
>>>>> +     */
>>>>> +    if (!cpuset_locked) {
>>>> Adding a global variable makes this difficult to handle, especially in
>>>> concurrent scenarios, since we could read it outside of a critical region.
>>> No, cpuset_locked is always read from or written into inside a critical section.
>>> It is under cpuset_mutex up to this point and then with the cpuset_top_mutex
>>> with the next patch.
>> This is somewhat confusing. cpuset_locked is only set to true when the "full
>> lock" has been acquired. If cpuset_locked is false, that should mean we are
>> outside of any critical region. Conversely, if we are inside a critical region,
>> cpuset_locked should be true.
>>
>> The situation is a bit messy, it’s not clearly which lock protects which global
>> variable.
> 
> There is a comment above "cpuset_locked" which state which lock protect it. The
> locking situation is becoming more complicated. I think I will add a new patch
> to more clearly document what each global variable is being protected by.
> 

Yes, We need that.

> 
>>
>>>> I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
>>>> instead, which can indicate to the caller it should call without holding the
>>>> full lock.
>>> The point of this global variable is to distinguish between calling from CPU
>>> hotplug and the other regular cpuset code paths. The only difference between
>>> these two are having cpus_read_lock or cpus_write_lock held. That is why I think
>>> adding a global variable in cpuset_full_lock() is the easy way. Otherwise, we
>>> will to add extra argument to some of the functions to distinguish these two
>>> cases.
>>>
>>> Cheers,
>>> Longman
>>>
> 

-- 
Best regards,
Ridong

Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Chen Ridong 1 week, 1 day ago

On 2026/1/30 23:42, Waiman Long wrote:
> The update_isolation_cpumasks() function can be called either directly
> from regular cpuset control file write with cpuset_full_lock() called
> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
> 
> As we are going to enable dynamic update to the nozh_full housekeeping
> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
> allowing the CPU hotplug path to call into housekeeping_update() directly
> from update_isolation_cpumasks() will likely cause deadlock. So we
> have to defer any call to housekeeping_update() after the CPU hotplug
> operation has finished. This is now done via the workqueue where
> the actual housekeeping_update() call, if needed, will happen after
> cpus_write_lock is released.
> 
> We can't use the synchronous task_work API as call from CPU hotplug
> path happen in the per-cpu kthread of the CPU that is being shut down
> or brought up. Because of the asynchronous nature of workqueue, the
> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
> "cpuset.cpus.isolated" control file in this case.
> 
> Also add a check in test_cpuset_prs.sh and modify some existing
> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
> housekeeping cpumask will both be updated.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>  .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>  2 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 7b7d12ab1006..0b0eb1df09d5 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -84,6 +84,9 @@ static cpumask_var_t	isolated_cpus;
>   */
>  static bool isolated_cpus_updating;
>  
> +/* Both cpuset_mutex and cpus_read_locked acquired */
> +static bool cpuset_locked;
> +
>  /*
>   * A flag to force sched domain rebuild at the end of an operation.
>   * It can be set in
> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>  {
>  	cpus_read_lock();
>  	mutex_lock(&cpuset_mutex);
> +	cpuset_locked = true;
>  }
>  
>  void cpuset_full_unlock(void)
>  {
> +	cpuset_locked = false;
>  	mutex_unlock(&cpuset_mutex);
>  	cpus_read_unlock();
>  }
> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>  	return false;
>  }
>  
> +static void isolcpus_workfn(struct work_struct *work)
> +{
> +	cpuset_full_lock();
> +	if (isolated_cpus_updating) {
> +		WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
> +		isolated_cpus_updating = false;
> +	}
> +	cpuset_full_unlock();
> +}
> +
>  /*
>   * update_isolation_cpumasks - Update external isolation related CPU masks
>   *
> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>   */
>  static void update_isolation_cpumasks(void)
>  {
> -	int ret;
> +	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>  
>  	if (!isolated_cpus_updating)
>  		return;
>  

Can this happen?

cpu0					cpu1
[...]

isolated_cpus_updating = true;
...
// 'full_lock' is not acquired
update_isolation_cpumasks
					// exec worker concurrently
					isolcpus_workfn
					cpuset_full_lock
					isolated_cpus_updating = false;
					cpuset_full_unlock();
// This returns uncorrectly
if (!isolated_cpus_updating)
	return;

> -	ret = housekeeping_update(isolated_cpus);
> -	WARN_ON_ONCE(ret < 0);
> +	/*
> +	 * This function can be reached either directly from regular cpuset
> +	 * control file write (cpuset_locked) or via hotplug (cpus_write_lock
> +	 * && cpuset_mutex held). In the later case, we defer the
> +	 * housekeeping_update() call to the system_unbound_wq to avoid the
> +	 * possibility of deadlock. This also means that there will be a short
> +	 * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
> +	 * behind isolated_cpus.
> +	 */
> +	if (!cpuset_locked) {
> +		/*
> +		 * We rely on WORK_STRUCT_PENDING_BIT to not requeue a work
> +		 * item that is still pending.
> +		 */
> +		queue_work(system_unbound_wq, &isolcpus_work);
> +		return;
> +	}
>  
> +	WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>  	isolated_cpus_updating = false;
>  }
>  
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index 5dff3ad53867..0502b156582b 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -245,8 +245,9 @@ TEST_MATRIX=(
>  	"C2-3:P1:S+  C3:P2  .      .     O2=0   O2=1    .      .     0 A1:2|A2:3 A1:P1|A2:P2"
>  	"C2-3:P1:S+  C3:P1  .      .     O2=0    .      .      .     0 A1:|A2:3 A1:P1|A2:P1"
>  	"C2-3:P1:S+  C3:P1  .      .     O3=0    .      .      .     0 A1:2|A2: A1:P1|A2:P1"
> -	"C2-3:P1:S+  C3:P1  .      .    T:O2=0   .      .      .     0 A1:3|A2:3 A1:P1|A2:P-1"
> -	"C2-3:P1:S+  C3:P1  .      .      .    T:O3=0   .      .     0 A1:2|A2:2 A1:P1|A2:P-1"
> +	"C2-3:P1:S+  C3:P2  .      .    T:O2=0   .      .      .     0 A1:3|A2:3 A1:P1|A2:P-2"
> +	"C1-3:P1:S+  C3:P2  .      .      .    T:O3=0   .      .     0 A1:1-2|A2:1-2 A1:P1|A2:P-2 3|"
> +	"C1-3:P1:S+  C3:P2  .      .      .    T:O3=0  O3=1    .     0 A1:1-2|A2:3 A1:P1|A2:P2  3"
>  	"$SETUP_A123_PARTITIONS    .     O1=0    .      .      .     0 A1:|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
>  	"$SETUP_A123_PARTITIONS    .     O2=0    .      .      .     0 A1:1|A2:|A3:3 A1:P1|A2:P1|A3:P1"
>  	"$SETUP_A123_PARTITIONS    .     O3=0    .      .      .     0 A1:1|A2:2|A3: A1:P1|A2:P1|A3:P1"
> @@ -764,7 +765,7 @@ check_cgroup_states()
>  # only CPUs in isolated partitions as well as those that are isolated at
>  # boot time.
>  #
> -# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
> +# $1 - expected isolated cpu list(s) <isolcpus1>{|<isolcpus2>}
>  # <isolcpus1> - expected sched/domains value
>  # <isolcpus2> - cpuset.cpus.isolated value = <isolcpus1> if not defined
>  #
> @@ -773,6 +774,7 @@ check_isolcpus()
>  	EXPECTED_ISOLCPUS=$1
>  	ISCPUS=${CGROUP2}/cpuset.cpus.isolated
>  	ISOLCPUS=$(cat $ISCPUS)
> +	HKICPUS=$(cat /sys/devices/system/cpu/isolated)
>  	LASTISOLCPU=
>  	SCHED_DOMAINS=/sys/kernel/debug/sched/domains
>  	if [[ $EXPECTED_ISOLCPUS = . ]]
> @@ -810,6 +812,11 @@ check_isolcpus()
>  	ISOLCPUS=
>  	EXPECTED_ISOLCPUS=$EXPECTED_SDOMAIN
>  
> +	#
> +	# The inverse of HK_TYPE_DOMAIN cpumask in $HKICPUS should match $ISOLCPUS
> +	#
> +	[[ "$ISOLCPUS" != "$HKICPUS" ]] && return 1
> +
>  	#
>  	# Use the sched domain in debugfs to check isolated CPUs, if available
>  	#

-- 
Best regards,
Ridong
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Waiman Long 1 week, 1 day ago
On 1/30/26 7:47 PM, Chen Ridong wrote:
>
> On 2026/1/30 23:42, Waiman Long wrote:
>> The update_isolation_cpumasks() function can be called either directly
>> from regular cpuset control file write with cpuset_full_lock() called
>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
Note this statement.
>>
>> As we are going to enable dynamic update to the nozh_full housekeeping
>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>> allowing the CPU hotplug path to call into housekeeping_update() directly
>> from update_isolation_cpumasks() will likely cause deadlock. So we
>> have to defer any call to housekeeping_update() after the CPU hotplug
>> operation has finished. This is now done via the workqueue where
>> the actual housekeeping_update() call, if needed, will happen after
>> cpus_write_lock is released.
>>
>> We can't use the synchronous task_work API as call from CPU hotplug
>> path happen in the per-cpu kthread of the CPU that is being shut down
>> or brought up. Because of the asynchronous nature of workqueue, the
>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>> "cpuset.cpus.isolated" control file in this case.
>>
>> Also add a check in test_cpuset_prs.sh and modify some existing
>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>> housekeeping cpumask will both be updated.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>   .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>   2 files changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 7b7d12ab1006..0b0eb1df09d5 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -84,6 +84,9 @@ static cpumask_var_t	isolated_cpus;
>>    */
>>   static bool isolated_cpus_updating;
>>   
>> +/* Both cpuset_mutex and cpus_read_locked acquired */
>> +static bool cpuset_locked;
>> +
>>   /*
>>    * A flag to force sched domain rebuild at the end of an operation.
>>    * It can be set in
>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>   {
>>   	cpus_read_lock();
>>   	mutex_lock(&cpuset_mutex);
>> +	cpuset_locked = true;
>>   }
>>   
>>   void cpuset_full_unlock(void)
>>   {
>> +	cpuset_locked = false;
>>   	mutex_unlock(&cpuset_mutex);
>>   	cpus_read_unlock();
>>   }
>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>>   	return false;
>>   }
>>   
>> +static void isolcpus_workfn(struct work_struct *work)
>> +{
>> +	cpuset_full_lock();
>> +	if (isolated_cpus_updating) {
>> +		WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>> +		isolated_cpus_updating = false;
>> +	}
>> +	cpuset_full_unlock();
>> +}
>> +
>>   /*
>>    * update_isolation_cpumasks - Update external isolation related CPU masks
>>    *
>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>>    */
>>   static void update_isolation_cpumasks(void)
>>   {
>> -	int ret;
>> +	static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>   
>>   	if (!isolated_cpus_updating)
>>   		return;
>>   
> Can this happen?
>
> cpu0					cpu1
> [...]
>
> isolated_cpus_updating = true;
> ...
> // 'full_lock' is not acquired
> update_isolation_cpumasks
That is not true. Either cpus_read_lock or cpus_write_lock and 
cpuset_mutex are held when update_isolation_cpumasks() is called. So 
there is mutual exclusion.
> 					// exec worker concurrently
> 					isolcpus_workfn
> 					cpuset_full_lock
> 					isolated_cpus_updating = false;
> 					cpuset_full_unlock();
> // This returns uncorrectly
> if (!isolated_cpus_updating)
> 	return;
>
Cheers,
Longman
Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Chen Ridong 1 week, 1 day ago

On 2026/1/31 9:06, Waiman Long wrote:
> 
> On 1/30/26 7:47 PM, Chen Ridong wrote:
>>
>> On 2026/1/30 23:42, Waiman Long wrote:
>>> The update_isolation_cpumasks() function can be called either directly
>>> from regular cpuset control file write with cpuset_full_lock() called
>>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
> Note this statement.

Thank you for reminder.

>>>
>>> As we are going to enable dynamic update to the nozh_full housekeeping
>>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>>> allowing the CPU hotplug path to call into housekeeping_update() directly
>>> from update_isolation_cpumasks() will likely cause deadlock. So we
>>> have to defer any call to housekeeping_update() after the CPU hotplug
>>> operation has finished. This is now done via the workqueue where
>>> the actual housekeeping_update() call, if needed, will happen after
>>> cpus_write_lock is released.
>>>
>>> We can't use the synchronous task_work API as call from CPU hotplug
>>> path happen in the per-cpu kthread of the CPU that is being shut down
>>> or brought up. Because of the asynchronous nature of workqueue, the
>>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>>> "cpuset.cpus.isolated" control file in this case.
>>>
>>> Also add a check in test_cpuset_prs.sh and modify some existing
>>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>>> housekeeping cpumask will both be updated.
>>>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>>   .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>>   2 files changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 7b7d12ab1006..0b0eb1df09d5 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -84,6 +84,9 @@ static cpumask_var_t    isolated_cpus;
>>>    */
>>>   static bool isolated_cpus_updating;
>>>   +/* Both cpuset_mutex and cpus_read_locked acquired */
>>> +static bool cpuset_locked;
>>> +
>>>   /*
>>>    * A flag to force sched domain rebuild at the end of an operation.
>>>    * It can be set in
>>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>>   {
>>>       cpus_read_lock();
>>>       mutex_lock(&cpuset_mutex);
>>> +    cpuset_locked = true;
>>>   }
>>>     void cpuset_full_unlock(void)
>>>   {
>>> +    cpuset_locked = false;
>>>       mutex_unlock(&cpuset_mutex);
>>>       cpus_read_unlock();
>>>   }
>>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate,
>>> struct cpumask *new_cpus)
>>>       return false;
>>>   }
>>>   +static void isolcpus_workfn(struct work_struct *work)
>>> +{
>>> +    cpuset_full_lock();
>>> +    if (isolated_cpus_updating) {
>>> +        WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>>> +        isolated_cpus_updating = false;
>>> +    }
>>> +    cpuset_full_unlock();
>>> +}
>>> +
>>>   /*
>>>    * update_isolation_cpumasks - Update external isolation related CPU masks
>>>    *
>>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int
>>> prstate, struct cpumask *new_cpus)
>>>    */
>>>   static void update_isolation_cpumasks(void)
>>>   {
>>> -    int ret;
>>> +    static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>>         if (!isolated_cpus_updating)
>>>           return;
>>>   
>> Can this happen?
>>
>> cpu0                    cpu1
>> [...]
>>
>> isolated_cpus_updating = true;
>> ...
>> // 'full_lock' is not acquired
>> update_isolation_cpumasks
> That is not true. Either cpus_read_lock or cpus_write_lock and cpuset_mutex are
> held when update_isolation_cpumasks() is called. So there is mutual exclusion.

Eh, we currently assume that it can only be called from existing scenarios, so
it's okay for now. But I'm concerned that if we later use
update_isolation_cpumasks without realizing that we need to hold either
cpus_write_lock or (cpus_read_lock && cpuset_mutex) , we could run into
concurrency issues. Maybe I'm worrying too much.

And maybe we shuold add 'lockdep_assert_held' inside the  update_isolation_cpumasks.

>>                     // exec worker concurrently
>>                     isolcpus_workfn
>>                     cpuset_full_lock
>>                     isolated_cpus_updating = false;
>>                     cpuset_full_unlock();
>> // This returns uncorrectly
>> if (!isolated_cpus_updating)
>>     return;
>>
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong

Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue
Posted by Chen Ridong 1 week, 1 day ago

On 2026/1/31 9:43, Chen Ridong wrote:
> 
> 
> On 2026/1/31 9:06, Waiman Long wrote:
>>
>> On 1/30/26 7:47 PM, Chen Ridong wrote:
>>>
>>> On 2026/1/30 23:42, Waiman Long wrote:
>>>> The update_isolation_cpumasks() function can be called either directly
>>>> from regular cpuset control file write with cpuset_full_lock() called
>>>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
>> Note this statement.
> 
> Thank you for reminder.
> 
>>>>
>>>> As we are going to enable dynamic update to the nozh_full housekeeping
>>>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>>>> allowing the CPU hotplug path to call into housekeeping_update() directly
>>>> from update_isolation_cpumasks() will likely cause deadlock. So we
>>>> have to defer any call to housekeeping_update() after the CPU hotplug
>>>> operation has finished. This is now done via the workqueue where
>>>> the actual housekeeping_update() call, if needed, will happen after
>>>> cpus_write_lock is released.
>>>>
>>>> We can't use the synchronous task_work API as call from CPU hotplug
>>>> path happen in the per-cpu kthread of the CPU that is being shut down
>>>> or brought up. Because of the asynchronous nature of workqueue, the
>>>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>>>> "cpuset.cpus.isolated" control file in this case.
>>>>
>>>> Also add a check in test_cpuset_prs.sh and modify some existing
>>>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>>>> housekeeping cpumask will both be updated.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>   kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
>>>>   .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
>>>>   2 files changed, 44 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index 7b7d12ab1006..0b0eb1df09d5 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -84,6 +84,9 @@ static cpumask_var_t    isolated_cpus;
>>>>    */
>>>>   static bool isolated_cpus_updating;
>>>>   +/* Both cpuset_mutex and cpus_read_locked acquired */
>>>> +static bool cpuset_locked;
>>>> +
>>>>   /*
>>>>    * A flag to force sched domain rebuild at the end of an operation.
>>>>    * It can be set in
>>>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>>>   {
>>>>       cpus_read_lock();
>>>>       mutex_lock(&cpuset_mutex);
>>>> +    cpuset_locked = true;
>>>>   }
>>>>     void cpuset_full_unlock(void)
>>>>   {
>>>> +    cpuset_locked = false;
>>>>       mutex_unlock(&cpuset_mutex);
>>>>       cpus_read_unlock();
>>>>   }
>>>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate,
>>>> struct cpumask *new_cpus)
>>>>       return false;
>>>>   }
>>>>   +static void isolcpus_workfn(struct work_struct *work)
>>>> +{
>>>> +    cpuset_full_lock();
>>>> +    if (isolated_cpus_updating) {
>>>> +        WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>>>> +        isolated_cpus_updating = false;
>>>> +    }
>>>> +    cpuset_full_unlock();
>>>> +}
>>>> +
>>>>   /*
>>>>    * update_isolation_cpumasks - Update external isolation related CPU masks
>>>>    *
>>>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int
>>>> prstate, struct cpumask *new_cpus)
>>>>    */
>>>>   static void update_isolation_cpumasks(void)
>>>>   {
>>>> -    int ret;
>>>> +    static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>>>         if (!isolated_cpus_updating)
>>>>           return;
>>>>   
>>> Can this happen?
>>>
>>> cpu0                    cpu1
>>> [...]
>>>
>>> isolated_cpus_updating = true;
>>> ...
>>> // 'full_lock' is not acquired
>>> update_isolation_cpumasks
>> That is not true. Either cpus_read_lock or cpus_write_lock and cpuset_mutex are
>> held when update_isolation_cpumasks() is called. So there is mutual exclusion.
> 
> Eh, we currently assume that it can only be called from existing scenarios, so
> it's okay for now. But I'm concerned that if we later use
> update_isolation_cpumasks without realizing that we need to hold either
> cpus_write_lock or (cpus_read_lock && cpuset_mutex) , we could run into
> concurrency issues. Maybe I'm worrying too much.
> 
> And maybe we shuold add 'lockdep_assert_held' inside the  update_isolation_cpumasks.
> 

I saw in patch 2/2 that isolated_cpus_updating is described as "protected by
cpuset_top_mutex." This could be a bit ambiguous: the caller need to hold either
cpus_read_lock or cpus_write_lock and cpuset_mutex to protect
isolated_cpus_updating.

>>>                     // exec worker concurrently
>>>                     isolcpus_workfn
>>>                     cpuset_full_lock
>>>                     isolated_cpus_updating = false;
>>>                     cpuset_full_unlock();
>>> // This returns uncorrectly
>>> if (!isolated_cpus_updating)
>>>     return;
>>>
>> Cheers,
>> Longman
>>
> 

-- 
Best regards,
Ridong