From nobody Mon May 25 04:33:49 2026 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40A562DA75A for ; Tue, 19 May 2026 01:28:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154082; cv=none; b=J4H/yxpGdkaheFvEpCh3qwza13CbWKXLLU5ySiCqCIaSP8PswJtdDak3rAU+xcxp92fId1IguLzSzg7wTRboSIrq/qMO0w/7mPGRzTYUJpJSH507BCKh4Mryj0k+eqaM0dRflHhmsZ+f++93W3DKvNIyrsSZc5ByZR3vyQElAEk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154082; c=relaxed/simple; bh=XnRgLofZwqMGF0spCFn6L3eA/usJNna8SRhZyDrKHAQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vF+HS5JXTg/xvk8PGfKKuq5K7+P9ceWDCwpj7v21cMa9R3Z5ERfUu/ZR5GbdiZkvttKNEk+7cuyupLNtZxeFmuVfZDAU9sPGQjFwCFr7uaCmDcyXEoAacRKmYu3EcklIbrB+ORbMLePuiwRCMEiBLGtwHyEHpazCO1I/ztiYF6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dpqYwLq7; arc=none smtp.client-ip=209.85.221.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dpqYwLq7" Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-452169ae568so1878734f8f.3 for ; Mon, 18 May 2026 18:28:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779154079; x=1779758879; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zL+SFQRFZfYUjbIosBJ6WqJiHFVaURUa0YlMjzH/90c=; b=dpqYwLq7cMgeuoRT57qa7Sb5igWUPG5rvY7e392vNWm6IhcAyjDqtvMgOrZarmKl7O AHXAqQkM+TWAFJZl+WJyIWQt72kLw+lXr3BtO4C4atQUaVD+Jag7At5aBP2SKSMZDxOg EqAOE2SkHvSev22D31mktNbuy85y0uQ1huUC1gJJ8TNAJWT3U900kPvvVEvjzj6FJZjg FgY5LXbXpWeiLBAZIf830XXNFEEhvrw+d7TGQCqkK6P2EsXb3At7UuvdxO+vB9Nj1VOF 9S5CR7zFgglir3Bqi3KBlmCgemcBAYvrDNNxYhVfAU873qSp5QrTpODzae+Dy3o3l74Q SUYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779154079; x=1779758879; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=zL+SFQRFZfYUjbIosBJ6WqJiHFVaURUa0YlMjzH/90c=; b=WD+yPCIOvcUDkkeFnil1P7bqnCHxt1kXw5rVINmxzrbSTXok8r5BUNWz4UHWK0ztTa cKfSZAbG9KLo43hTX6g/QyW8se1nK38GiI5h1c43Bbkn/+pe33LMNDhZQroIDDDwzMus Uo88OQMvQQQGxbdQZYEg7gLaU/97+PMEtEWiIGajl0Qb/Yqeq3J5dGVKVaoggHLpo4ds BU1LQjPzfhtmqR6Sqpz/HGPa2wOimmFpKSElw5HJuUEy2csdsnAwR6aGZQ7HQiJs1nO9 ta/YbXb+OyZSU1YnhRYoa5RQ9tw/TLsQe1dqdGdEjYZkGV0WKPz9k2EyDBb8LY0XxJLG GCtQ== X-Forwarded-Encrypted: i=1; AFNElJ/4wF8SlD79qPOoyGdqiOb3u7b/KcNRs1JjOghoDuos9xX/KMT3SahwuszrU5B3RnT16ctzi1AFGcOglmg=@vger.kernel.org X-Gm-Message-State: AOJu0YxEYBvs5+ijXGQydinq4nq5kUPYY4m4yTNZMECaYDeWo+bQgEsl LCUXYPkuzyjquK3hi2r4CM75KirvpLzy0A2+6X07CHtFowG8tMD2rH3k X-Gm-Gg: Acq92OGRTigcMb/GP6P2UxCVBCrQ7oNQpt00hwGzCcRGFLlLTaZthyUzsZ1Luq4Wdrr P5ZnZAx2QsyCJJyw/8/gQtTzsDk2zyeQRFvQqbVyPExwU27zxPDwvmbEnFFUfpvXYCH2GAdi9vi bJXSPgjwlGMPXJSj+9S39Vud6qcxstOO7Kos+wtoyViqwe8JAT9a4zNlet/Aypxq76OAwoQQPgX t6z+Gr8GitKBujCKgEYSOE/tMZo5EwLd5M/+76S4oq0niAI2TedlziU4LIr4O8HKDH4/GqgASca Z3jZkVyC6bWd6hVCRVFmdn8kHxINywJXPK0uLkbYxcEHrGe0LxUCdvsJBLujY1unTImBHahzqvG 1f72QATkyQA0CLEBnq1SNkYFIOWBWkrmshWpeFdi99NsIUmhrgP9RQPM7AoffDRJnSyELiyqZre ZL5zGKcyRrIU2GTs5nSGSt7V23dCSPDVUpLkFY0bhp X-Received: by 2002:a05:6000:22c5:b0:43d:6fb7:fedb with SMTP id ffacd0b85a97d-45e5c60a30cmr29018254f8f.36.1779154078288; Mon, 18 May 2026 18:27:58 -0700 (PDT) Received: from WindFlash.powerhub ([2a0a:ef40:f83:8501:800:cd4:5e2:9556]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45d9ed2f738sm40548683f8f.16.2026.05.18.18.27.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2026 18:27:57 -0700 (PDT) From: Leonardo Bras To: Jonathan Corbet , Shuah Khan , Leonardo Bras , Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , Waiman Long , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , Brendan Jackman , Johannes Weiner , Zi Yan , Harry Yoo , Hao Li , Christoph Lameter , David Rientjes , Roman Gushchin , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , Qi Zheng , Shakeel Butt , Axel Rasmussen , Yuanchu Xie , Wei Xu , "Borislav Petkov (AMD)" , Randy Dunlap , Feng Tang , Dapeng Mi , Kees Cook , Marco Elver , Jakub Kicinski , Li RongQing , Eric Biggers , "Paul E. McKenney" , Nathan Chancellor , Nicolas Schier , Miguel Ojeda , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Thomas Gleixner , Douglas Anderson , Gary Guo , Christian Brauner , Pasha Tatashin , Coiby Xu , Masahiro Yamada , Frederic Weisbecker Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rt-devel@lists.linux.dev, Marcelo Tosatti Subject: [PATCH v4 1/4] Introducing pw_lock() and per-cpu queue & flush work Date: Mon, 18 May 2026 22:27:47 -0300 Message-ID: <20260519012754.240804-2-leobras.c@gmail.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260519012754.240804-1-leobras.c@gmail.com> References: <20260519012754.240804-1-leobras.c@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=21259; i=leobras.c@gmail.com; h=from:subject; bh=XnRgLofZwqMGF0spCFn6L3eA/usJNna8SRhZyDrKHAQ=; b=owGbwMvMwCX2pizjszvTwvWMp9WSGLK494SrzRddFdmwhWtN9AmdSUyFbGcM96isvCRjskR10 r2Ddw3XdJSyMIhxMciKKbLIPpq/iuf7lIwjV34sgJnDygQyhIGLUwAm8n8Swz/bo3GnndT3Zmdx 1Kw9IJUS8by3W+Fj8u0chybd6QHnev8xMuzeYP/NcvXpD9MXSBXec57NHCiqdHTaCemi2t+6EvP yZnIBAA== X-Developer-Key: i=leobras.c@gmail.com; a=openpgp; fpr=36E6C95AE0F111CC5B6F4D2E688C33F8A0C5B0C5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some places in the kernel implement a parallel programming strategy consisting on local_locks() for most of the work, and some rare remote operations are scheduled on target cpu. This keeps cache bouncing low since cacheline tends to be mostly local, and avoids the cost of locks in non-RT kernels, even though the very few remote operations will be expensive due to scheduling overhead. On the other hand, for RT workloads this can represent a problem: scheduling work on remote cpu that are executing low latency tasks is undesired and can introduce unexpected deadline misses. It's interesting, though, that local_lock()s in RT kernels become spinlock(). We can make use of those to avoid scheduling work on a remote cpu by directly updating another cpu's per_cpu structure, while holding it's spinlock(). In order to do that, it's necessary to introduce a new set of functions to make it possible to get another cpu's per-cpu "local" lock (pw_{un,}lock*) and also do the corresponding queueing (pw_queue_on()) and flushing (pw_flush()) helpers to run the remote work. Users of non-RT kernels but with low latency requirements can select similar functionality by using the CONFIG_PWLOCKS compile time option. On CONFIG_PWLOCKS disabled kernels, no changes are expected, as every one of the introduced helpers work the exactly same as the current implementation: pw_{un,}lock*() -> local_{un,}lock*() (ignores cpu parameter) pw_queue_on() -> queue_work_on() pw_flush() -> flush_work() For PWLOCKS enabled kernels, though, pw_{un,}lock*() will use the extra cpu parameter to select the correct per-cpu structure to work on, and acquire the spinlock for that cpu. pw_queue_on() will just call the requested function in the current cpu, which will operate in another cpu's per-cpu object. Since the local_locks() become spinlock()s in PWLOCKS enabled kernels, we are safe doing that. pw_flush() then becomes a no-op since no work is actually scheduled on a remote cpu. Some minimal code rework is needed in order to make this mechanism work: The calls for local_{un,}lock*() on the functions that are currently scheduled on remote cpus need to be replaced by either pw_{un,}lock_*(), PWLOCKS enabled kernels they can reference a different cpu. It's also necessary to use a pw_struct instead of a work_struct, but it just contains a work struct and, in CONFIG_PWLOCKS, the target cpu. This should have almost no impact on non-CONFIG_PWLOCKS kernels: few this_cpu_ptr() will become per_cpu_ptr(,smp_processor_id()) on non-hotpath functions. On CONFIG_PWLOCKS kernels, this should avoid deadlines misses by removing scheduling noise. Signed-off-by: Leonardo Bras Signed-off-by: Marcelo Tosatti --- MAINTAINERS | 7 + .../admin-guide/kernel-parameters.txt | 10 + Documentation/locking/pwlocks.rst | 76 +++++ init/Kconfig | 35 +++ kernel/Makefile | 2 + include/linux/pwlocks.h | 265 ++++++++++++++++++ kernel/pwlocks.c | 47 ++++ 7 files changed, 442 insertions(+) create mode 100644 Documentation/locking/pwlocks.rst create mode 100644 include/linux/pwlocks.h create mode 100644 kernel/pwlocks.c diff --git a/MAINTAINERS b/MAINTAINERS index c2c6d79275c6..7102031207c9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21775,20 +21775,27 @@ QORIQ DPAA2 FSL-MC BUS DRIVER M: Ioana Ciornei L: linuxppc-dev@lists.ozlabs.org L: linux-kernel@vger.kernel.org S: Maintained F: Documentation/ABI/stable/sysfs-bus-fsl-mc F: Documentation/devicetree/bindings/misc/fsl,qoriq-mc.yaml F: Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overvi= ew.rst F: drivers/bus/fsl-mc/ F: include/uapi/linux/fsl_mc.h =20 +PW Locks +M: Leonardo Bras +S: Supported +F: Documentation/locking/pwlocks.rst +F: include/linux/pwlocks.h +F: kernel/pwlocks.c + QT1010 MEDIA DRIVER L: linux-media@vger.kernel.org S: Orphan W: https://linuxtv.org Q: http://patchwork.linuxtv.org/project/linux-media/list/ F: drivers/media/tuners/qt1010* =20 QUALCOMM ATH12K WIRELESS DRIVER M: Jeff Johnson L: linux-wireless@vger.kernel.org diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 4d0f545fb3ec..68c8a6f9d227 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2810,20 +2810,30 @@ Kernel parameters If a queue's affinity mask contains only isolated CPUs then this parameter has no effect on the interrupt routing decision, though interrupts are only delivered when tasks running on those isolated CPUs submit IO. IO submitted on housekeeping CPUs has no influence on those queues. =20 The format of is described above. =20 + pwlocks=3D [KNL,SMP] Select a behavior on per-CPU resource sharing + and remote interference mechanism on a kernel built with + CONFIG_PWLOCKS. + Format: { "0" | "1" } + 0 - local_lock() + queue_work_on(remote_cpu) + 1 - spin_lock() for both local and remote operations + + Selecting 1 may be interesting for systems that want + to avoid interruption & context switches from IPIs. + iucv=3D [HW,NET] =20 ivrs_ioapic [HW,X86-64] Provide an override to the IOAPIC-ID<->DEVICE-ID mapping provided in the IVRS ACPI table. By default, PCI segment is 0, and can be omitted. =20 For example, to map IOAPIC-ID decimal 10 to PCI segment 0x1 and PCI device 00:14.0, write the parameter as: diff --git a/Documentation/locking/pwlocks.rst b/Documentation/locking/pwlo= cks.rst new file mode 100644 index 000000000000..09f4a5417bc1 --- /dev/null +++ b/Documentation/locking/pwlocks.rst @@ -0,0 +1,76 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D +PW (Per-CPU Work) locks +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Some places in the kernel implement a parallel programming strategy +consisting on local_locks() for most of the work, and some rare remote +operations are scheduled on target cpu. This keeps cache bouncing low since +cacheline tends to be mostly local, and avoids the cost of locks in non-RT +kernels, even though the very few remote operations will be expensive due +to scheduling overhead. + +On the other hand, for RT workloads this can represent a problem: +scheduling work on remote cpu that are executing low latency tasks +is undesired and can introduce unexpected deadline misses. + +PW locks help to convert sites that use local_locks (for cpu local operati= ons) +and queue_work_on (for queueing work remotely, to be executed +locally on the owner cpu of the lock) to a spinlocks. + +The lock is declared pw_lock_t type. +The lock is initialized with pw_lock_init. +The lock is locked with pw_lock (takes a lock and cpu as a parameter). +The lock is unlocked with pw_unlock (takes a lock and cpu as a parameter). + +The pw_lock_irqsave function disables interrupts and saves current interru= pt state, +cpu as a parameter. + +For trylock variant, there is the pw_trylock_t type, initialized with +pw_trylock_init. Then the corresponding pw_trylock and pw_trylock_irqsave. + +work_struct should be replaced by pw_struct, which contains a cpu parameter +(owner cpu of the lock), initialized by INIT_PW. + +The queue work related functions (analogous to queue_work_on and flush_wor= k) are: +pw_queue_on and pw_flush. + +The behaviour of the PW lock functions is as follows: + +* !CONFIG_PWLOCKS (or CONFIG_PWLOCKS and pwlocks=3Doff kernel boot paramet= er): + - pw_lock: local_lock + - pw_lock_irqsave: local_lock_irqsave + - pw_trylock: local_trylock + - pw_trylock_irqsave: local_trylock_irqsave + - pw_unlock: local_unlock + - pw_lock_local: local_lock + - pw_trylock_local: local_trylock + - pw_unlock_local: local_unlock + - pw_queue_on: queue_work_on + - pw_flush: flush_work + +* CONFIG_PWLOCKS (and CONFIG_PWLOCKS_DEFAULT=3Dy or pwlocks=3Don kernel bo= ot parameter), + - pw_lock: spin_lock + - pw_lock_irqsave: spin_lock_irqsave + - pw_trylock: spin_trylock + - pw_trylock_irqsave: spin_trylock_irqsave + - pw_unlock: spin_unlock + - pw_lock_local: preempt_disable OR migrate_disable + spin_lock + - pw_trylock_local: preempt_disable OR migrate_disable + spin_try= lock + - pw_unlock_local: preempt_enable OR migrate_enable + spin_unlock + - pw_queue_on: executes work function on caller cpu + - pw_flush: empty + +pw_get_cpu(work_struct), to be called from within per-cpu work function, +returns the target cpu. + +On the locking functions above, there are the local locking functions +(pw_lock_local, pw_trylock_local and pw_unlock_local) that must only +be used to access per-CPU data from the CPU that owns that data, +and never remotely. They disable preemption/migration and don't require +a cpu parameter, making them a replacement for local_lock functions that +does not introduce overhead. + +These should only be used when accessing per-CPU data of the local CPU. + diff --git a/init/Kconfig b/init/Kconfig index 2937c4d308ae..3fb751dc4530 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -764,20 +764,55 @@ config CPU_ISOLATION depends on SMP default y help Make sure that CPUs running critical tasks are not disturbed by any source of "noise" such as unbound workqueues, timers, kthreads... Unbound jobs get offloaded to housekeeping CPUs. This is driven by the "isolcpus=3D" boot parameter. =20 Say Y if unsure. =20 +config PWLOCKS + bool "Per-CPU Work locks" + depends on SMP || COMPILE_TEST + default n + help + Allow changing the behavior on per-CPU resource sharing with cache, + from the regular local_locks() + queue_work_on(remote_cpu) to using + per-CPU spinlocks on both local and remote operations. + + This is useful to give user the option on reducing IPIs to CPUs, and + thus reduce interruptions and context switches. On the other hand, it + increases generated code and will use atomic operations if spinlocks + are selected. + + If set, will use the default behavior set in PWLOCKS_DEFAULT unless boot + parameter pwlocks is passed with a different behavior. + + If unset, will use the local_lock() + queue_work_on() strategy, + regardless of the boot parameter or PWLOCKS_DEFAULT. + + Say N if unsure. + +config PWLOCKS_DEFAULT + bool "Use per-CPU spinlocks by default on PWLOCKS" + depends on PWLOCKS + default n + help + If set, will use per-CPU spinlocks as default behavior for per-CPU + remote operations. + + If unset, will use local_lock() + queue_work_on(cpu) as default + behavior for remote operations. + + Say N if unsure + source "kernel/rcu/Kconfig" =20 config IKCONFIG tristate "Kernel .config support" help This option enables the complete Linux kernel ".config" file contents to be saved in the kernel. It provides documentation of which kernel options are used in a running kernel or in an on-disk kernel. This information can be extracted from the kernel image file with the script scripts/extract-ikconfig and used as diff --git a/kernel/Makefile b/kernel/Makefile index 6785982013dc..60ccad0699e7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -135,20 +135,22 @@ obj-$(CONFIG_JUMP_LABEL) +=3D jump_label.o obj-$(CONFIG_CONTEXT_TRACKING) +=3D context_tracking.o obj-$(CONFIG_TORTURE_TEST) +=3D torture.o =20 obj-$(CONFIG_HAS_IOMEM) +=3D iomem.o obj-$(CONFIG_RSEQ) +=3D rseq.o obj-$(CONFIG_WATCH_QUEUE) +=3D watch_queue.o =20 obj-$(CONFIG_RESOURCE_KUNIT_TEST) +=3D resource_kunit.o obj-$(CONFIG_SYSCTL_KUNIT_TEST) +=3D sysctl-test.o =20 +obj-$(CONFIG_PWLOCKS) +=3D pwlocks.o + CFLAGS_kstack_erase.o +=3D $(DISABLE_KSTACK_ERASE) CFLAGS_kstack_erase.o +=3D $(call cc-option,-mgeneral-regs-only) obj-$(CONFIG_KSTACK_ERASE) +=3D kstack_erase.o KASAN_SANITIZE_kstack_erase.o :=3D n KCSAN_SANITIZE_kstack_erase.o :=3D n KCOV_INSTRUMENT_kstack_erase.o :=3D n =20 obj-$(CONFIG_SCF_TORTURE_TEST) +=3D scftorture.o =20 $(obj)/configs.o: $(obj)/config_data.gz diff --git a/include/linux/pwlocks.h b/include/linux/pwlocks.h new file mode 100644 index 000000000000..3d79621655f9 --- /dev/null +++ b/include/linux/pwlocks.h @@ -0,0 +1,265 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PWLOCKS_H +#define _LINUX_PWLOCKS_H + +#include "linux/spinlock.h" +#include "linux/local_lock.h" +#include "linux/workqueue.h" + +#ifndef CONFIG_PWLOCKS + +typedef local_lock_t pw_lock_t; +typedef local_trylock_t pw_trylock_t; + +struct pw_struct { + struct work_struct work; +}; + +#define pw_lock_init(lock) \ + local_lock_init(lock) + +#define pw_trylock_init(lock) \ + local_trylock_init(lock) + +#define pw_lock(lock, cpu) \ + local_lock(lock) + +#define pw_lock_local(lock) \ + local_lock(lock) + +#define pw_lock_irqsave(lock, flags, cpu) \ + local_lock_irqsave(lock, flags) + +#define pw_lock_local_irqsave(lock, flags) \ + local_lock_irqsave(lock, flags) + +#define pw_trylock(lock, cpu) \ + local_trylock(lock) + +#define pw_trylock_local(lock) \ + local_trylock(lock) + +#define pw_trylock_irqsave(lock, flags, cpu) \ + local_trylock_irqsave(lock, flags) + +#define pw_unlock(lock, cpu) \ + local_unlock(lock) + +#define pw_unlock_local(lock) \ + local_unlock(lock) + +#define pw_unlock_irqrestore(lock, flags, cpu) \ + local_unlock_irqrestore(lock, flags) + +#define pw_unlock_local_irqrestore(lock, flags) \ + local_unlock_irqrestore(lock, flags) + +#define pw_lockdep_assert_held(lock) \ + lockdep_assert_held(lock) + +#define pw_queue_on(c, wq, pw) \ + queue_work_on(c, wq, &(pw)->work) + +#define pw_flush(pw) \ + flush_work(&(pw)->work) + +#define pw_get_cpu(pw) smp_processor_id() + +#define pw_is_cpu_remote(cpu) (false) + +#define INIT_PW(pw, func, c) \ + INIT_WORK(&(pw)->work, (func)) + +#else /* CONFIG_PWLOCKS */ + +DECLARE_STATIC_KEY_MAYBE(CONFIG_PWLOCKS_DEFAULT, pw_sl); + +typedef union { + spinlock_t sl; + local_lock_t ll; +} pw_lock_t; + +typedef union { + spinlock_t sl; + local_trylock_t ll; +} pw_trylock_t; + +struct pw_struct { + struct work_struct work; + int cpu; +}; + +#ifdef CONFIG_PREEMPT_RT +#define preempt_or_migrate_disable migrate_disable +#define preempt_or_migrate_enable migrate_enable +#else +#define preempt_or_migrate_disable preempt_disable +#define preempt_or_migrate_enable preempt_enable +#endif + +#define pw_lock_init(lock) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + spin_lock_init(lock.sl); \ + else \ + local_lock_init(lock.ll); \ +} while (0) + +#define pw_trylock_init(lock) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + spin_lock_init(lock.sl); \ + else \ + local_trylock_init(lock.ll); \ +} while (0) + +#define pw_lock(lock, cpu) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + spin_lock(per_cpu_ptr(lock.sl, cpu)); \ + else \ + local_lock(lock.ll); \ +} while (0) + +#define pw_lock_local(lock) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) { \ + preempt_or_migrate_disable(); \ + spin_lock(this_cpu_ptr(lock.sl)); \ + } else { \ + local_lock(lock.ll); \ + } \ +} while (0) + +#define pw_lock_irqsave(lock, flags, cpu) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + spin_lock_irqsave(per_cpu_ptr(lock.sl, cpu), flags); \ + else \ + local_lock_irqsave(lock.ll, flags); \ +} while (0) + +#define pw_lock_local_irqsave(lock, flags) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) { \ + preempt_or_migrate_disable(); \ + spin_lock_irqsave(this_cpu_ptr(lock.sl), flags); \ + } else { \ + local_lock_irqsave(lock.ll, flags); \ + } \ +} while (0) + +#define pw_trylock(lock, cpu) \ +({ \ + int t; \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + t =3D spin_trylock(per_cpu_ptr(lock.sl, cpu)); \ + else \ + t =3D local_trylock(lock.ll); \ + t; \ +}) + +#define pw_trylock_local(lock) \ +({ \ + int t; \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) { \ + preempt_or_migrate_disable(); \ + t =3D spin_trylock(this_cpu_ptr(lock.sl)); \ + if (!t) \ + preempt_or_migrate_enable(); \ + } else { \ + t =3D local_trylock(lock.ll); \ + } \ + t; \ +}) + +#define pw_trylock_irqsave(lock, flags, cpu) \ +({ \ + int t; \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + t =3D spin_trylock_irqsave(per_cpu_ptr(lock.sl, cpu), flags); \ + else \ + t =3D local_trylock_irqsave(lock.ll, flags); \ + t; \ +}) + +#define pw_unlock(lock, cpu) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + spin_unlock(per_cpu_ptr(lock.sl, cpu)); \ + else \ + local_unlock(lock.ll); \ +} while (0) + +#define pw_unlock_local(lock) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) { \ + spin_unlock(this_cpu_ptr(lock.sl)); \ + preempt_or_migrate_enable(); \ + } else { \ + local_unlock(lock.ll); \ + } \ +} while (0) + +#define pw_unlock_irqrestore(lock, flags, cpu) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + spin_unlock_irqrestore(per_cpu_ptr(lock.sl, cpu), flags); \ + else \ + local_unlock_irqrestore(lock.ll, flags); \ +} while (0) + +#define pw_unlock_local_irqrestore(lock, flags) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) { \ + spin_unlock_irqrestore(this_cpu_ptr(lock.sl), flags); \ + preempt_or_migrate_enable(); \ + } else { \ + local_unlock_irqrestore(lock.ll, flags); \ + } \ +} while (0) + +#define pw_lockdep_assert_held(lock) \ +do { \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + lockdep_assert_held(this_cpu_ptr(lock.sl)); \ + else \ + lockdep_assert_held(this_cpu_ptr(lock.ll)); \ +} while (0) + +#define pw_queue_on(c, wq, pw) \ +do { \ + int __c =3D c; \ + struct pw_struct *__pw =3D (pw); \ + if (static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) { \ + WARN_ON((__c) !=3D __pw->cpu); \ + __pw->work.func(&__pw->work); \ + } else { \ + queue_work_on(__c, wq, &(__pw)->work); \ + } \ +} while (0) + +/* + * Does nothing if PWLOCKS is set to use spinlock, as the task is already = done at the + * time pw_queue_on() returns. + */ +#define pw_flush(pw) \ +do { \ + struct pw_struct *__pw =3D (pw); \ + if (!static_branch_maybe(CONFIG_PWLOCKS_DEFAULT, &pw_sl)) \ + flush_work(&__pw->work); \ +} while (0) + +#define pw_get_cpu(w) container_of((w), struct pw_struct, work)->cpu + +#define pw_is_cpu_remote(cpu) ((cpu) !=3D smp_processor_id()) + +#define INIT_PW(pw, func, c) \ +do { \ + struct pw_struct *__pw =3D (pw); \ + INIT_WORK(&__pw->work, (func)); \ + __pw->cpu =3D (c); \ +} while (0) + +#endif /* CONFIG_PWLOCKS */ +#endif /* LINUX_PWLOCKS_H */ diff --git a/kernel/pwlocks.c b/kernel/pwlocks.c new file mode 100644 index 000000000000..1ebf5cb979b9 --- /dev/null +++ b/kernel/pwlocks.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "linux/export.h" +#include +#include +#include +#include + +DEFINE_STATIC_KEY_MAYBE(CONFIG_PWLOCKS_DEFAULT, pw_sl); +EXPORT_SYMBOL(pw_sl); + +static bool pwlocks_param_specified; + +static int __init pwlocks_setup(char *str) +{ + int opt; + + if (!get_option(&str, &opt)) { + pr_warn("PWLOCKS: invalid pwlocks parameter: %s, ignoring.\n", str); + return 0; + } + + if (opt) + static_branch_enable(&pw_sl); + else + static_branch_disable(&pw_sl); + + pwlocks_param_specified =3D true; + + return 1; +} +__setup("pwlocks=3D", pwlocks_setup); + +/* + * Enable PWLOCKS if CPUs want to avoid kernel noise. + */ +static int __init pwlocks_init(void) +{ + if (pwlocks_param_specified) + return 0; + + if (housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) + static_branch_enable(&pw_sl); + + return 0; +} + +late_initcall(pwlocks_init); --=20 2.54.0 From nobody Mon May 25 04:33:49 2026 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A30D52EE262 for ; Tue, 19 May 2026 01:28:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154085; cv=none; b=t1bsyE1JIc9tuO4k7BZQhJb88T8vr8ueVeTp9Vmy0SDt+LhnrpUpKPq6WeXLgz1yRpx5CsNjIIyzPXxCvH7nbe98/jQMT/RUm/OeH28d1CJsIjLJ9Jm6eDM1OgcZL8Aw8/hV1ErZxiFB4xqySF7qzwVJlLjpg6VNgEDu6GcPVJc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154085; c=relaxed/simple; bh=ZORajtxT6wigR1eruHvBak0h54xyw0ZbbNpidoiz0aE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=unjyNCecy3K5OrrGq33aw38hJuBeq7sJwEAuYH9N8wxO9fcpOgNsF3cgFr51P475VNliHhVw0CVvov7bASCOcWp7UVJF+Zrg5oLtoM4aLFCG56AcmEYkHtZO+aKrT4wjpo1IeddCIthWTjLJjiICWR2VBYvifWeVQjoh/Q0dMNI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XJo4XzVB; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XJo4XzVB" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-4526a8170ceso1300910f8f.2 for ; Mon, 18 May 2026 18:28:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779154081; x=1779758881; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rSA5ue1X4vqSsU9ce40m8EGDnzYMIz0AiRlAoh99mwA=; b=XJo4XzVBHerRex2SgW44mcoBQdDaMtJ/POKdOMh39Hl4eOokJeHI4Z4EJLUdkjMMPx 9SrVwHpq0WjFMCOfubROVvlXTXNlVheu+tFI/RjdlqdiqTy+rz8TB6QOTkIRqE6XLCl/ Bd131FT+3X1sKZ1qZ3Eu7ysIkP14IhQUMa7858MHeenEcODo7IR/JcVM9l6WvnMfPAAD Lx933htkHFhlYSkQINLkcUQ6T72uNfrM7AW2+c8DkK+UbUcIQ02MM4LuOYn8ArpeNcml muTqWAb/MsfjxpIsmHp1U4Vu6NkPVxVrcIwFR/syOLUsjYvMnHCsdkddh+N7oghTkUob /w4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779154081; x=1779758881; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=rSA5ue1X4vqSsU9ce40m8EGDnzYMIz0AiRlAoh99mwA=; b=maH/2MrsIDhG8ZUWf1S/g9YU8o1NUNXBZ+jDpURqOXT3RC6dVcv8q9U6umzxPLu2sB 1IyA1LMdbBeMUbzaXtMQoMc5vJBcdnmUKk5AcRZIadDNN5bKl3enXUW0YwQ1ozUVQuqd tJ6KCogY+Z5ILw/LcdpYfLVNxfMu4nOmC38y+SWe2Uib1siBNkCsz98jlzdl334ouA04 qJB+tfJgvRcGR8rX6twhAgId1wzp1crBl9D4QRNHGuppvq1P3PfbGRxBmj/V6NDnwq/v X7GOnWlqRV7G41oMUYvPUHeMJPVuMgeebhoGlrtnlPY+OHj59SEBGFXzP3zyi0JDvFOM Gsow== X-Forwarded-Encrypted: i=1; AFNElJ9WZfa0Q1g3WUNoWpfcxvlDQXwnKzTATNCd+RdwYINkSKQO8ULUy5SLAJohoB8DxsbRrIOirQojb1I9ars=@vger.kernel.org X-Gm-Message-State: AOJu0YykFrQ1Jez1diliAt1gzVGvinRdNKs0gAAtkKX0rEeyF0T8J4Qe pa2virH93MSoqC9bRIt9ykQ5QRrRW3YkZeP4Dhy9OnamfpC95vbWTks3 X-Gm-Gg: Acq92OHiiXz1l3SyaZB9Lj4Wn7YfKEHqrh4aEAmL8fwK2llSjwV3PozOkMb2pwZEoYP RrNf8ta4DDdYBMf3hE5Qgx75K6p8uk+8Fuufh83x+7W2nlPfPA9qyxeVGGdQe5Pnv19uNsnBa1r ZF+Oga2uuW7izsPuLHF+oQSeo6iAV8uft/kaTlCiYCXA6g+U0Nbk0rbkKj/ySkQW3uCl+u4PuSK QdWHbSLYYb1JdJr3lyMk9YR/x0ztDCXE8vgK6X1S0keaD0Xex8DMwZRTx2GtXbHTfzMHplxeu+z LbhIIFuCjRdXmvk8KvOKYXFtgbaJ9Cqmo3yDZC8NltIzPr+VV8agBa2xXEMggM3qWOUMUHEO+2/ g9GgP74HAkMZdwJNqHHSsGo4kZ2KShRWMD+oA4S1I9qVGa8Hh9CzeXaLCnP7ncdlbhiUJgAt8pu Rib9tf60Mt4e01K/Eorwo6RjTdCyOQpg== X-Received: by 2002:a5d:5f90:0:b0:43d:300b:2285 with SMTP id ffacd0b85a97d-45e5c5be273mr27212388f8f.11.1779154080739; Mon, 18 May 2026 18:28:00 -0700 (PDT) Received: from WindFlash.powerhub ([2a0a:ef40:f83:8501:800:cd4:5e2:9556]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45d9ed2f738sm40548683f8f.16.2026.05.18.18.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2026 18:27:59 -0700 (PDT) From: Leonardo Bras To: Jonathan Corbet , Shuah Khan , Leonardo Bras , Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , Waiman Long , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , Brendan Jackman , Johannes Weiner , Zi Yan , Harry Yoo , Hao Li , Christoph Lameter , David Rientjes , Roman Gushchin , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , Qi Zheng , Shakeel Butt , Axel Rasmussen , Yuanchu Xie , Wei Xu , "Borislav Petkov (AMD)" , Randy Dunlap , Feng Tang , Dapeng Mi , Kees Cook , Marco Elver , Jakub Kicinski , Li RongQing , Eric Biggers , "Paul E. McKenney" , Nathan Chancellor , Nicolas Schier , Miguel Ojeda , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Thomas Gleixner , Douglas Anderson , Gary Guo , Christian Brauner , Pasha Tatashin , Coiby Xu , Masahiro Yamada , Frederic Weisbecker Cc: Marcelo Tosatti , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rt-devel@lists.linux.dev Subject: [PATCH v4 2/4] mm/swap: move bh draining into a separate workqueue Date: Mon, 18 May 2026 22:27:48 -0300 Message-ID: <20260519012754.240804-3-leobras.c@gmail.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260519012754.240804-1-leobras.c@gmail.com> References: <20260519012754.240804-1-leobras.c@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6247; i=leobras.c@gmail.com; h=from:subject; bh=dnloqw2ofMQgHOlrcYy0lnqm5BG7U4//P9axIvgFWic=; b=owGbwMvMwCX2pizjszvTwvWMp9WSGLK494TzVnNUJFg81Ts0ocTPLebpfN6VmUXcodwz5y0JL 7X/fq22o5SFQYyLQVZMkUX20fxVPN+nZBy58mMBzBxWJpAhDFycAjCR/dsZGZ7evOjOcdlPqGRu KOdb71KxD4+SpaKb4ybbzjk6J2V9UAjD/4xnL3beP7nksgDT/e+u/+W7W/7N3vX6d+XkTRuYzVd UL2YFAA== X-Developer-Key: i=leobras.c@gmail.com; a=openpgp; fpr=36E6C95AE0F111CC5B6F4D2E688C33F8A0C5B0C5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Marcelo Tosatti Separate the bh draining into a separate workqueue (from the mm lru draining), so that its possible to switch the mm lru draining to QPW. To switch bh draining to QPW, it would be necessary to add a spinlock to addition of bhs to percpu cache, and that is a very hot path. Signed-off-by: Marcelo Tosatti Signed-off-by: Leonardo Bras --- mm/swap.c | 52 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 5cc44f0de987..ed9b3d371547 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -744,60 +744,70 @@ void lru_add_drain(void) local_unlock(&cpu_fbatches.lock); mlock_drain_local(); } =20 /* * It's called from per-cpu workqueue context in SMP case so * lru_add_drain_cpu and invalidate_bh_lrus_cpu should run on * the same cpu. It shouldn't be a problem in !SMP case since * the core is only one and the locks will disable preemption. */ -static void lru_add_and_bh_lrus_drain(void) +static void lru_add_mm_drain(void) { local_lock(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); local_unlock(&cpu_fbatches.lock); - invalidate_bh_lrus_cpu(); mlock_drain_local(); } =20 void lru_add_drain_cpu_zone(struct zone *zone) { local_lock(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); drain_local_pages(zone); local_unlock(&cpu_fbatches.lock); mlock_drain_local(); } =20 #ifdef CONFIG_SMP =20 static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work); =20 static void lru_add_drain_per_cpu(struct work_struct *dummy) { - lru_add_and_bh_lrus_drain(); + lru_add_mm_drain(); } =20 -static bool cpu_needs_drain(unsigned int cpu) +static DEFINE_PER_CPU(struct work_struct, bh_add_drain_work); + +static void bh_add_drain_per_cpu(struct work_struct *dummy) +{ + invalidate_bh_lrus_cpu(); +} + +static bool cpu_needs_mm_drain(unsigned int cpu) { struct cpu_fbatches *fbatches =3D &per_cpu(cpu_fbatches, cpu); =20 /* Check these in order of likelihood that they're not zero */ return folio_batch_count(&fbatches->lru_add) || folio_batch_count(&fbatches->lru_move_tail) || folio_batch_count(&fbatches->lru_deactivate_file) || folio_batch_count(&fbatches->lru_deactivate) || folio_batch_count(&fbatches->lru_lazyfree) || folio_batch_count(&fbatches->lru_activate) || - need_mlock_drain(cpu) || - has_bh_in_lru(cpu, NULL); + need_mlock_drain(cpu); +} + +static bool cpu_needs_bh_drain(unsigned int cpu) +{ + return has_bh_in_lru(cpu, NULL); } =20 /* * Doesn't need any cpu hotplug locking because we do rely on per-cpu * kworkers being shut down before our page_alloc_cpu_dead callback is * executed on the offlined cpu. * Calling this function with cpu hotplug locks held can actually lead * to obscure indirect dependencies via WQ context. */ static inline void __lru_add_drain_all(bool force_all_cpus) @@ -806,21 +816,21 @@ static inline void __lru_add_drain_all(bool force_all= _cpus) * lru_drain_gen - Global pages generation number * * (A) Definition: global lru_drain_gen =3D x implies that all generations * 0 < n <=3D x are already *scheduled* for draining. * * This is an optimization for the highly-contended use case where a * user space workload keeps constantly generating a flow of pages for * each CPU. */ static unsigned int lru_drain_gen; - static struct cpumask has_work; + static struct cpumask has_mm_work, has_bh_work; static DEFINE_MUTEX(lock); unsigned cpu, this_gen; =20 /* * Make sure nobody triggers this path before mm_percpu_wq is fully * initialized. */ if (WARN_ON(!mm_percpu_wq)) return; =20 @@ -869,34 +879,45 @@ static inline void __lru_add_drain_all(bool force_all= _cpus) * along, adds some pages to its per-cpu vectors, then calls * lru_add_drain_all(). * * If the paired barrier is done at any later step, e.g. after the * loop, CPU #x will just exit at (C) and miss flushing out all of its * added pages. */ WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); smp_mb(); =20 - cpumask_clear(&has_work); + cpumask_clear(&has_mm_work); + cpumask_clear(&has_bh_work); for_each_online_cpu(cpu) { - struct work_struct *work =3D &per_cpu(lru_add_drain_work, cpu); + struct work_struct *mm_work =3D &per_cpu(lru_add_drain_work, cpu); + struct work_struct *bh_work =3D &per_cpu(bh_add_drain_work, cpu); =20 - if (cpu_needs_drain(cpu)) { - INIT_WORK(work, lru_add_drain_per_cpu); - queue_work_on(cpu, mm_percpu_wq, work); - __cpumask_set_cpu(cpu, &has_work); + if (cpu_needs_mm_drain(cpu)) { + INIT_WORK(mm_work, lru_add_drain_per_cpu); + queue_work_on(cpu, mm_percpu_wq, mm_work); + __cpumask_set_cpu(cpu, &has_mm_work); + } + + if (cpu_needs_bh_drain(cpu)) { + INIT_WORK(bh_work, bh_add_drain_per_cpu); + queue_work_on(cpu, mm_percpu_wq, bh_work); + __cpumask_set_cpu(cpu, &has_bh_work); } } =20 - for_each_cpu(cpu, &has_work) + for_each_cpu(cpu, &has_mm_work) flush_work(&per_cpu(lru_add_drain_work, cpu)); =20 + for_each_cpu(cpu, &has_bh_work) + flush_work(&per_cpu(bh_add_drain_work, cpu)); + done: mutex_unlock(&lock); } =20 void lru_add_drain_all(void) { __lru_add_drain_all(false); } #else void lru_add_drain_all(void) @@ -928,21 +949,22 @@ void lru_cache_disable(void) * * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on * preempt_disable() regions of code. So any CPU which sees * lru_disable_count =3D 0 will have exited the critical * section when synchronize_rcu() returns. */ synchronize_rcu_expedited(); #ifdef CONFIG_SMP __lru_add_drain_all(true); #else - lru_add_and_bh_lrus_drain(); + lru_add_mm_drain(); + invalidate_bh_lrus_cpu(); #endif } =20 /** * folios_put_refs - Reduce the reference count on a batch of folios. * @folios: The folios. * @refs: The number of refs to subtract from each folio. * * Like folio_put(), but for a batch of folios. This is more efficient * than writing the loop yourself as it will optimise the locks which need --=20 2.54.0 From nobody Mon May 25 04:33:49 2026 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7119D2E6CCD for ; Tue, 19 May 2026 01:28:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154088; cv=none; b=XRx999iQkK+JA/gVt7NzJA5r0idWNUf6UCE2rDSVCc3o5Pjev+5eIQbzEPW4+VIhKHRcCu6MyqCcWqb7gX05fkk5tIkadXr9G6L63OChMXxT+HvfgozY181nscO5KMAwdMw74jEAMrcbRCv3s8bsVSvf69effd23KjCusC1/CvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154088; c=relaxed/simple; bh=4BVHLvuBFvyjuj3EzPBbAtH6I8IO+E5EmUGwpgvpZIw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lADdwEG+jdphxiOpiC0nmjZISuWYi1PKJBEVpwXZGEQQDyui9e0oq5zL15ppw4FQXo+iviFxaDA2hKNrKnGryYTPghnLZE6FVixLYD0d0PMEKikE36iB4vs1MMK2hsrR6jNimH8ZenB2NjZB37rEqTJQu9lo9rv7gIoKJgcqRPs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=s8zzxH1U; arc=none smtp.client-ip=209.85.221.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="s8zzxH1U" Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-441209fb77eso1699847f8f.1 for ; Mon, 18 May 2026 18:28:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779154083; x=1779758883; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NYIFZtEOROXKEaAPsW/zk5dSrEVUPynUxnKhGIAxI0A=; b=s8zzxH1UmDAn9PbeXgz06pAygV6k/PFHRZWZVumSb+BAqK15frXA4panHcyr/FS37a 10KSK+4qVGS/G3NTj4P2iK9RN202E+ZPb5O/pyg/X56ndRi3JAVwFslXVZdpJAo7taor y2Q4BOVh1++Rt9TnyAFln30cFN0LTnF2sO0ldvAaHM5h1GeXzL7fK8YKJwnhIdbx3Bm+ pAuVi8sjHjuy+dvtEJAMolu1H6K26aL/sAiTlBP1ZzGuuyHUJ8wOzL0wZ19vSIr5qG5f 6pb8yGGtAsPnYVtBz4mNaeEeeV+exNH6pnzLceUuFs7GgVK8pvZfrbuzfRgLawANLIDR bHsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779154083; x=1779758883; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=NYIFZtEOROXKEaAPsW/zk5dSrEVUPynUxnKhGIAxI0A=; b=FG2GSAypYZ+J/+z2avVG76RMhIlJLO9x8XhZSsnz6PJZapmBQ+8BX5xA8scQCPEiir fYU32TUBIB2HTn7ST/3zBTqffCGph+JWKvArH9vAISWyzW6+JDLTHxiHVCLdELtVIQKU QfCdiKyW22aBuWVIlhe8O7Giu2nVsoGKXW2T4Nf+kgTFwZOg3vq5GEixS0N0bwi7ZFsr SQm0WF+6/SM0cdjZXWxpYWqOFHm5MkmpK19thXWz2YA3IGWG5ZbmDMLMy3r5GraPcC7x Z//WrvEPFZDUoTJcRJ9z0WM1kuEzRroHg/KiDmDheQwaaESrciwG7bzY5A4KEFIE0smi Iyag== X-Forwarded-Encrypted: i=1; AFNElJ8rzfKlgG9H/NOjWQ0fVjbtcf5RNCXBZ1KKZisZnZCd04BhT5X+XTpQCy8tSzwKJig3S4Bt8KaLk0yqdac=@vger.kernel.org X-Gm-Message-State: AOJu0YznAdBcduiA2V4liEJrf+O8TOxzOPoTynFRC306FP/I27gX79y+ 1mZHc4zS7m9NVSXYhE2M16YDtGfwEO3GKwpJAf/BXr+EgG+TGzL5uhZE X-Gm-Gg: Acq92OHkFgv/gCWR4wg39KfN0/xFMNoHxYAmythEbzLGqcTP1n1KbwPzRyhDwUAKaNb 1xQxmHz4Z9YNpflskjvnhAVqRiiWVtnv4mgZn0EyPmMyEdVV+I7/sJqZ7O3Ymn6m5NKuYr8VQMl /jaPpbH4hd5nzyZ+BZL8CUC6txhwGFteRcnKwJ9eUbXL3ZJOGVrdnaZjL7AjHB0J6JlYMzj/A9c KO2NjilgRmpN8XoK6vFe4fnAMUJc0LAQJwKiFMyTZb1NlM7bcWkU1edrjElno1pRMxcPnpDSkMz +BoV4azlnxJeV94fbbNh1vTNUvNMvtb6YHmKfUX4BzOsUY0jTOSFSnYNyM4LVCWImMgxiYyTkX+ zYwAkGiC6f3qegj6aV0uvC43CpnTGc3sH23kjzlwtOp25HsIpU13NihHbRD3kLU8ZC+TEoSau3u RJ5gRSIuh+qq9Z73V1bw+5Zz1U6EZZ2A== X-Received: by 2002:a05:6000:1a89:b0:452:bc74:b129 with SMTP id ffacd0b85a97d-45e5b886090mr27913668f8f.16.1779154082589; Mon, 18 May 2026 18:28:02 -0700 (PDT) Received: from WindFlash.powerhub ([2a0a:ef40:f83:8501:800:cd4:5e2:9556]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45d9ed2f738sm40548683f8f.16.2026.05.18.18.28.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2026 18:28:02 -0700 (PDT) From: Leonardo Bras To: Jonathan Corbet , Shuah Khan , Leonardo Bras , Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , Waiman Long , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , Brendan Jackman , Johannes Weiner , Zi Yan , Harry Yoo , Hao Li , Christoph Lameter , David Rientjes , Roman Gushchin , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , Qi Zheng , Shakeel Butt , Axel Rasmussen , Yuanchu Xie , Wei Xu , "Borislav Petkov (AMD)" , Randy Dunlap , Feng Tang , Dapeng Mi , Kees Cook , Marco Elver , Jakub Kicinski , Li RongQing , Eric Biggers , "Paul E. McKenney" , Nathan Chancellor , Nicolas Schier , Miguel Ojeda , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Thomas Gleixner , Douglas Anderson , Gary Guo , Christian Brauner , Pasha Tatashin , Coiby Xu , Masahiro Yamada , Frederic Weisbecker Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rt-devel@lists.linux.dev, Marcelo Tosatti Subject: [PATCH v4 3/4] swap: apply new pw_queue_on() interface Date: Mon, 18 May 2026 22:27:49 -0300 Message-ID: <20260519012754.240804-4-leobras.c@gmail.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260519012754.240804-1-leobras.c@gmail.com> References: <20260519012754.240804-1-leobras.c@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=18514; i=leobras.c@gmail.com; h=from:subject; bh=4BVHLvuBFvyjuj3EzPBbAtH6I8IO+E5EmUGwpgvpZIw=; b=owGbwMvMwCX2pizjszvTwvWMp9WSGLK494Tn32I2yTyYcNf+2gxpTttQn28Xuy5dClV5u+9Kg fdhrvXZHaUsDGJcDLJiiiyyj+av4vk+JePIlR8LYOawMoEMYeDiFICJzEhn+Ke/3XQKX/6NHQ1r lqyp+db4e7lR2Jpv0/472m5+VGmyNiSMkWGj5y7lVyWfTuptXGOse/JHiYrpFp3fMo/XW66d/fT Qvi3sAA== X-Developer-Key: i=leobras.c@gmail.com; a=openpgp; fpr=36E6C95AE0F111CC5B6F4D2E688C33F8A0C5B0C5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make use of the new pw_{un,}lock*() and pw_queue_on() interface to improve performance & latency. For functions that may be scheduled in a different cpu, replace local_{un,}lock*() by pw_{un,}lock*(), and replace schedule_work_on() by pw_queue_on(). The same happens for flush_work() and pw_flush(). The change requires allocation of pw_structs instead of a work_structs, and changing parameters of a few functions to include the cpu parameter. This should bring no relevant performance impact on non-PWLOCKS kernels: For functions that may be scheduled in a different cpu, the local_*lock's this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()). Signed-off-by: Leonardo Bras Signed-off-by: Marcelo Tosatti --- mm/internal.h | 4 ++- mm/mlock.c | 51 ++++++++++++++++++++++++++---------- mm/page_alloc.c | 2 +- mm/swap.c | 69 ++++++++++++++++++++++++++----------------------- 4 files changed, 79 insertions(+), 47 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..1ec9a11c373b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1209,24 +1209,26 @@ static inline void munlock_vma_folio(struct folio *= folio, * cause folio not fully mapped to VMA. * * But it's not easy to confirm that's the situation. So we * always munlock the folio and page reclaim will correct it * if it's wrong. */ if (unlikely(vma->vm_flags & VM_LOCKED)) munlock_folio(folio); } =20 +int __init mlock_init(void); void mlock_new_folio(struct folio *folio); bool need_mlock_drain(int cpu); void mlock_drain_local(void); -void mlock_drain_remote(int cpu); +void mlock_drain_cpu(int cpu); +void mlock_drain_offline(int cpu); =20 extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); =20 /** * vma_address - Find the virtual address a page range is mapped at * @vma: The vma which maps this object. * @pgoff: The page offset within its object. * @nr_pages: The number of pages to consider. * * If any page in this range is mapped by this VMA, return the first addre= ss diff --git a/mm/mlock.c b/mm/mlock.c index 8c227fefa2df..5d25bbbb09e9 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -18,31 +18,30 @@ #include #include #include #include #include #include #include #include #include #include +#include =20 #include "internal.h" =20 struct mlock_fbatch { - local_lock_t lock; + pw_lock_t lock; struct folio_batch fbatch; }; =20 -static DEFINE_PER_CPU(struct mlock_fbatch, mlock_fbatch) =3D { - .lock =3D INIT_LOCAL_LOCK(lock), -}; +static DEFINE_PER_CPU(struct mlock_fbatch, mlock_fbatch); =20 bool can_do_mlock(void) { if (rlimit(RLIMIT_MEMLOCK) !=3D 0) return true; if (capable(CAP_IPC_LOCK)) return true; return false; } EXPORT_SYMBOL(can_do_mlock); @@ -202,32 +201,43 @@ static void mlock_folio_batch(struct folio_batch *fba= tch) lruvec =3D __mlock_new_folio(folio, lruvec); else lruvec =3D __munlock_folio(folio, lruvec); } =20 if (lruvec) lruvec_unlock_irq(lruvec); folios_put(fbatch); } =20 +void mlock_drain_cpu(int cpu) +{ + struct folio_batch *fbatch; + + pw_lock(&mlock_fbatch.lock, cpu); + fbatch =3D per_cpu_ptr(&mlock_fbatch.fbatch, cpu); + if (folio_batch_count(fbatch)) + mlock_folio_batch(fbatch); + pw_unlock(&mlock_fbatch.lock, cpu); +} + void mlock_drain_local(void) { struct folio_batch *fbatch; =20 - local_lock(&mlock_fbatch.lock); + pw_lock_local(&mlock_fbatch.lock); fbatch =3D this_cpu_ptr(&mlock_fbatch.fbatch); if (folio_batch_count(fbatch)) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + pw_unlock_local(&mlock_fbatch.lock); } =20 -void mlock_drain_remote(int cpu) +void mlock_drain_offline(int cpu) { struct folio_batch *fbatch; =20 WARN_ON_ONCE(cpu_online(cpu)); fbatch =3D &per_cpu(mlock_fbatch.fbatch, cpu); if (folio_batch_count(fbatch)) mlock_folio_batch(fbatch); } =20 bool need_mlock_drain(int cpu) @@ -236,79 +246,79 @@ bool need_mlock_drain(int cpu) } =20 /** * mlock_folio - mlock a folio already on (or temporarily off) LRU * @folio: folio to be mlocked. */ void mlock_folio(struct folio *folio) { struct folio_batch *fbatch; =20 - local_lock(&mlock_fbatch.lock); + pw_lock_local(&mlock_fbatch.lock); fbatch =3D this_cpu_ptr(&mlock_fbatch.fbatch); =20 if (!folio_test_set_mlocked(folio)) { int nr_pages =3D folio_nr_pages(folio); =20 zone_stat_mod_folio(folio, NR_MLOCK, nr_pages); __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); } =20 folio_get(folio); if (!folio_batch_add(fbatch, mlock_lru(folio)) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + pw_unlock_local(&mlock_fbatch.lock); } =20 /** * mlock_new_folio - mlock a newly allocated folio not yet on LRU * @folio: folio to be mlocked, either normal or a THP head. */ void mlock_new_folio(struct folio *folio) { struct folio_batch *fbatch; int nr_pages =3D folio_nr_pages(folio); =20 - local_lock(&mlock_fbatch.lock); + pw_lock_local(&mlock_fbatch.lock); fbatch =3D this_cpu_ptr(&mlock_fbatch.fbatch); folio_set_mlocked(folio); =20 zone_stat_mod_folio(folio, NR_MLOCK, nr_pages); __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); =20 folio_get(folio); if (!folio_batch_add(fbatch, mlock_new(folio)) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + pw_unlock_local(&mlock_fbatch.lock); } =20 /** * munlock_folio - munlock a folio * @folio: folio to be munlocked, either normal or a THP head. */ void munlock_folio(struct folio *folio) { struct folio_batch *fbatch; =20 - local_lock(&mlock_fbatch.lock); + pw_lock_local(&mlock_fbatch.lock); fbatch =3D this_cpu_ptr(&mlock_fbatch.fbatch); /* * folio_test_clear_mlocked(folio) must be left to __munlock_folio(), * which will check whether the folio is multiply mlocked. */ folio_get(folio); if (!folio_batch_add(fbatch, folio) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); - local_unlock(&mlock_fbatch.lock); + pw_unlock_local(&mlock_fbatch.lock); } =20 static inline unsigned int folio_mlock_step(struct folio *folio, pte_t *pte, unsigned long addr, unsigned long end) { unsigned int count =3D (end - addr) >> PAGE_SHIFT; pte_t ptent =3D ptep_get(pte); =20 if (!folio_test_large(folio)) return 1; @@ -822,10 +832,25 @@ int user_shm_lock(size_t size, struct ucounts *ucount= s) return allowed; } =20 void user_shm_unlock(size_t size, struct ucounts *ucounts) { spin_lock(&shmlock_user_lock); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1)= >> PAGE_SHIFT); spin_unlock(&shmlock_user_lock); put_ucounts(ucounts); } + +int __init mlock_init(void) +{ + unsigned int cpu; + + for_each_possible_cpu(cpu) { + struct mlock_fbatch *fbatch =3D &per_cpu(mlock_fbatch, cpu); + + pw_lock_init(&fbatch->lock); + } + + return 0; +} + +module_init(mlock_init); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 227d58dc3de6..fa768f07f88a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6217,21 +6217,21 @@ void free_reserved_page(struct page *page) __free_page(page); adjust_managed_page_count(page, 1); } EXPORT_SYMBOL(free_reserved_page); =20 static int page_alloc_cpu_dead(unsigned int cpu) { struct zone *zone; =20 lru_add_drain_cpu(cpu); - mlock_drain_remote(cpu); + mlock_drain_offline(cpu); drain_pages(cpu); =20 /* * Spill the event counters of the dead processor * into the current processors event counters. * This artificially elevates the count of the current * processor. */ vm_events_fold_cpu(cpu); =20 diff --git a/mm/swap.c b/mm/swap.c index ed9b3d371547..42f51bf4bb71 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -28,54 +28,51 @@ #include #include #include #include #include #include #include #include #include #include -#include +#include #include =20 #include "internal.h" =20 #define CREATE_TRACE_POINTS #include =20 /* How many pages do we try to swap or page in/out together? As a power of= 2 */ int page_cluster; static const int page_cluster_max =3D 31; =20 struct cpu_fbatches { /* * The following folio batches are grouped together because they are prot= ected * by disabling preemption (and interrupts remain enabled). */ - local_lock_t lock; + pw_lock_t lock; struct folio_batch lru_add; struct folio_batch lru_deactivate_file; struct folio_batch lru_deactivate; struct folio_batch lru_lazyfree; #ifdef CONFIG_SMP struct folio_batch lru_activate; #endif /* Protecting the following batches which require disabling interrupts */ - local_lock_t lock_irq; + pw_lock_t lock_irq; struct folio_batch lru_move_tail; }; =20 -static DEFINE_PER_CPU(struct cpu_fbatches, cpu_fbatches) =3D { - .lock =3D INIT_LOCAL_LOCK(lock), - .lock_irq =3D INIT_LOCAL_LOCK(lock_irq), -}; +static DEFINE_PER_CPU(struct cpu_fbatches, cpu_fbatches); =20 static void __page_cache_release(struct folio *folio, struct lruvec **lruv= ecp, unsigned long *flagsp) { if (folio_test_lru(folio)) { folio_lruvec_relock_irqsave(folio, lruvecp, flagsp); lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } } @@ -180,32 +177,32 @@ static void folio_batch_move_lru(struct folio_batch *= fbatch, move_fn_t move_fn) } =20 static void __folio_batch_add_and_move(struct folio_batch __percpu *fbatch, struct folio *folio, move_fn_t move_fn, bool disable_irq) { unsigned long flags; =20 folio_get(folio); =20 if (disable_irq) - local_lock_irqsave(&cpu_fbatches.lock_irq, flags); + pw_lock_local_irqsave(&cpu_fbatches.lock_irq, flags); else - local_lock(&cpu_fbatches.lock); + pw_lock_local(&cpu_fbatches.lock); =20 if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn); =20 if (disable_irq) - local_unlock_irqrestore(&cpu_fbatches.lock_irq, flags); + pw_unlock_local_irqrestore(&cpu_fbatches.lock_irq, flags); else - local_unlock(&cpu_fbatches.lock); + pw_unlock_local(&cpu_fbatches.lock); } =20 #define folio_batch_add_and_move(folio, op) \ __folio_batch_add_and_move( \ &cpu_fbatches.op, \ folio, \ op, \ offsetof(struct cpu_fbatches, op) >=3D \ offsetof(struct cpu_fbatches, lock_irq) \ ) @@ -356,21 +353,21 @@ void folio_activate(struct folio *folio) lruvec_unlock_irq(lruvec); folio_set_lru(folio); } #endif =20 static void __lru_cache_activate_folio(struct folio *folio) { struct folio_batch *fbatch; int i; =20 - local_lock(&cpu_fbatches.lock); + pw_lock_local(&cpu_fbatches.lock); fbatch =3D this_cpu_ptr(&cpu_fbatches.lru_add); =20 /* * Search backwards on the optimistic assumption that the folio being * activated has just been added to this batch. Note that only * the local batch is examined as a !LRU folio could be in the * process of being released, reclaimed, migrated or on a remote * batch that is currently being drained. Furthermore, marking * a remote batch's folio active potentially hits a race where * a folio is marked active just after it is added to the inactive @@ -378,21 +375,21 @@ static void __lru_cache_activate_folio(struct folio *= folio) */ for (i =3D folio_batch_count(fbatch) - 1; i >=3D 0; i--) { struct folio *batch_folio =3D fbatch->folios[i]; =20 if (batch_folio =3D=3D folio) { folio_set_active(folio); break; } } =20 - local_unlock(&cpu_fbatches.lock); + pw_unlock_local(&cpu_fbatches.lock); } =20 #ifdef CONFIG_LRU_GEN =20 static void lru_gen_inc_refs(struct folio *folio) { unsigned long new_flags, old_flags =3D READ_ONCE(folio->flags.f); =20 if (folio_test_unevictable(folio)) return; @@ -652,23 +649,23 @@ void lru_add_drain_cpu(int cpu) =20 if (folio_batch_count(fbatch)) folio_batch_move_lru(fbatch, lru_add); =20 fbatch =3D &fbatches->lru_move_tail; /* Disabling interrupts below acts as a compiler barrier. */ if (data_race(folio_batch_count(fbatch))) { unsigned long flags; =20 /* No harm done if a racing interrupt already did this */ - local_lock_irqsave(&cpu_fbatches.lock_irq, flags); + pw_lock_irqsave(&cpu_fbatches.lock_irq, flags, cpu); folio_batch_move_lru(fbatch, lru_move_tail); - local_unlock_irqrestore(&cpu_fbatches.lock_irq, flags); + pw_unlock_irqrestore(&cpu_fbatches.lock_irq, flags, cpu); } =20 fbatch =3D &fbatches->lru_deactivate_file; if (folio_batch_count(fbatch)) folio_batch_move_lru(fbatch, lru_deactivate_file); =20 fbatch =3D &fbatches->lru_deactivate; if (folio_batch_count(fbatch)) folio_batch_move_lru(fbatch, lru_deactivate); =20 @@ -732,56 +729,56 @@ void folio_mark_lazyfree(struct folio *folio) if (!folio_test_anon(folio) || !folio_test_swapbacked(folio) || !folio_test_lru(folio) || folio_test_swapcache(folio) || folio_test_unevictable(folio)) return; =20 folio_batch_add_and_move(folio, lru_lazyfree); } =20 void lru_add_drain(void) { - local_lock(&cpu_fbatches.lock); + pw_lock_local(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); - local_unlock(&cpu_fbatches.lock); + pw_unlock_local(&cpu_fbatches.lock); mlock_drain_local(); } =20 /* * It's called from per-cpu workqueue context in SMP case so * lru_add_drain_cpu and invalidate_bh_lrus_cpu should run on * the same cpu. It shouldn't be a problem in !SMP case since * the core is only one and the locks will disable preemption. */ -static void lru_add_mm_drain(void) +static void lru_add_mm_drain(int cpu) { - local_lock(&cpu_fbatches.lock); - lru_add_drain_cpu(smp_processor_id()); - local_unlock(&cpu_fbatches.lock); - mlock_drain_local(); + pw_lock(&cpu_fbatches.lock, cpu); + lru_add_drain_cpu(cpu); + pw_unlock(&cpu_fbatches.lock, cpu); + mlock_drain_cpu(cpu); } =20 void lru_add_drain_cpu_zone(struct zone *zone) { - local_lock(&cpu_fbatches.lock); + pw_lock_local(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); drain_local_pages(zone); - local_unlock(&cpu_fbatches.lock); + pw_unlock_local(&cpu_fbatches.lock); mlock_drain_local(); } =20 #ifdef CONFIG_SMP =20 -static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work); +static DEFINE_PER_CPU(struct pw_struct, lru_add_drain_pw); =20 -static void lru_add_drain_per_cpu(struct work_struct *dummy) +static void lru_add_drain_per_cpu(struct work_struct *w) { - lru_add_mm_drain(); + lru_add_mm_drain(pw_get_cpu(w)); } =20 static DEFINE_PER_CPU(struct work_struct, bh_add_drain_work); =20 static void bh_add_drain_per_cpu(struct work_struct *dummy) { invalidate_bh_lrus_cpu(); } =20 static bool cpu_needs_mm_drain(unsigned int cpu) @@ -882,38 +879,38 @@ static inline void __lru_add_drain_all(bool force_all= _cpus) * If the paired barrier is done at any later step, e.g. after the * loop, CPU #x will just exit at (C) and miss flushing out all of its * added pages. */ WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); smp_mb(); =20 cpumask_clear(&has_mm_work); cpumask_clear(&has_bh_work); for_each_online_cpu(cpu) { - struct work_struct *mm_work =3D &per_cpu(lru_add_drain_work, cpu); + struct pw_struct *mm_pw =3D &per_cpu(lru_add_drain_pw, cpu); struct work_struct *bh_work =3D &per_cpu(bh_add_drain_work, cpu); =20 if (cpu_needs_mm_drain(cpu)) { - INIT_WORK(mm_work, lru_add_drain_per_cpu); - queue_work_on(cpu, mm_percpu_wq, mm_work); + INIT_PW(mm_pw, lru_add_drain_per_cpu, cpu); + pw_queue_on(cpu, mm_percpu_wq, mm_pw); __cpumask_set_cpu(cpu, &has_mm_work); } =20 if (cpu_needs_bh_drain(cpu)) { INIT_WORK(bh_work, bh_add_drain_per_cpu); queue_work_on(cpu, mm_percpu_wq, bh_work); __cpumask_set_cpu(cpu, &has_bh_work); } } =20 for_each_cpu(cpu, &has_mm_work) - flush_work(&per_cpu(lru_add_drain_work, cpu)); + pw_flush(&per_cpu(lru_add_drain_pw, cpu)); =20 for_each_cpu(cpu, &has_bh_work) flush_work(&per_cpu(bh_add_drain_work, cpu)); =20 done: mutex_unlock(&lock); } =20 void lru_add_drain_all(void) { @@ -949,21 +946,21 @@ void lru_cache_disable(void) * * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on * preempt_disable() regions of code. So any CPU which sees * lru_disable_count =3D 0 will have exited the critical * section when synchronize_rcu() returns. */ synchronize_rcu_expedited(); #ifdef CONFIG_SMP __lru_add_drain_all(true); #else - lru_add_mm_drain(); + lru_add_mm_drain(smp_processor_id()); invalidate_bh_lrus_cpu(); #endif } =20 /** * folios_put_refs - Reduce the reference count on a batch of folios. * @folios: The folios. * @refs: The number of refs to subtract from each folio. * * Like folio_put(), but for a batch of folios. This is more efficient @@ -1156,23 +1153,31 @@ static const struct ctl_table swap_sysctl_table[] = =3D { .extra2 =3D (void *)&page_cluster_max, } }; =20 /* * Perform any setup for the swap system */ void __init swap_setup(void) { unsigned long megs =3D PAGES_TO_MB(totalram_pages()); + unsigned int cpu; =20 /* Use a smaller cluster for small-memory machines */ if (megs < 16) page_cluster =3D 2; else page_cluster =3D 3; /* * Right now other parts of the system means that we * _really_ don't want to cluster much more */ =20 register_sysctl_init("vm", swap_sysctl_table); + + for_each_possible_cpu(cpu) { + struct cpu_fbatches *fbatches =3D &per_cpu(cpu_fbatches, cpu); + + pw_lock_init(&fbatches->lock); + pw_lock_init(&fbatches->lock_irq); + } } --=20 2.54.0 From nobody Mon May 25 04:33:49 2026 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 564C62DFF04 for ; Tue, 19 May 2026 01:28:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154090; cv=none; b=QIbVcXBUwnm9aVElNwYGvPnm2BO4Et5n7K+vtaIE1muIYVP1KcFqo3t+ckz1bhMtGv33fcJvgykxnxgo+tiT6a7yGDoU8QRUsoUmdUfchRTe3ltWbpuiTQfPYZui9KIMIJRiOet0i6TSmi28nuTTAhxtEEUuTcKhw0kKaM2sGWI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779154090; c=relaxed/simple; bh=8q9feMK42IfvEu2tWg1UBCdmj8KxKXE5X0orIY01H4k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ilJpK4mOdaOucFLS1VuSN2h+LufFsUiLVW9f06KkWgpT/1pgOpbLiXMTi3XtZSKxSwgC4mKwd9DOSUpE/tkk6YDkfvRhhAVOFUStu6tw9YvbN1NFQGENrJK6t4QTcSDNsg8d+ec637w/yEEfwlUZ+hztpWaZm4ScmcwpDg+Epzk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=pBDTfIP1; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="pBDTfIP1" Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-444826c16ffso2541750f8f.1 for ; Mon, 18 May 2026 18:28:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779154085; x=1779758885; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Mx0oFUKOV8XCrLOMX8VahRqZqhy0/sjUBJqBWDg1Dmg=; b=pBDTfIP1390WfPuTaQf5Jic+frb9dxnuZurbfAj9qaIGH9MQLk3zCdyguylOd8LxVb 1w/CnFnOobvhn3h/QjsD0jGr87QvaDYYxLql967Z26r+rwuWGZY60r3v2HhBe49uPMNy VzaIFcAFjuPr2FeqEMT2f3qhwfPnogskHmtkARMZ8HtJruuTB2bx5aPlUlEo5lR820D+ XDGBPm9XOBL9iXjUwrZ7L2W66GV6mePjiliVgBycQ9Ukaw5Ne/TD/674HIE/fonAA/qZ NlMLDxlpbk5Uy34YqjmpmFD2/MdmiNL3GVNVxHozJgOcULo8CqXdM8bu3fCf2nfa8JOc wrkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779154085; x=1779758885; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Mx0oFUKOV8XCrLOMX8VahRqZqhy0/sjUBJqBWDg1Dmg=; b=VFwENHxlFril7ZSCCjbWjXXkh2CDucscF9eikxox5rJ+/9FnLYTSXojALdJYIDRO/r /ZRPvPgIAE6Cpobrfmr0EVYAEca2wB+rSHhAD5sG6AX++XohqHbkP3+ya1WJ7Xf3uWNK lUVK6Dh6ZMIYkvsUYw9ZuBVuJ7xHEuqvehYh9cqtMnrId0//+p0XqFUin9JZ/cPe2/18 dWR3p+ZspS/LG/LZdUdlpkuQIEHUa7PdcXGg/uOM3xenbf3XwbOOHqtVtyp1sU4owFPo 9KELIT1KYnrRgb+DEh5tCotuPhDoLclS2+GZzudiI8N/n8vas0pTxzhy5Wd9+ACmSVrJ gc4w== X-Forwarded-Encrypted: i=1; AFNElJ8s4Nn14r5eJidcp8cb3M8XWP6UAylrNzCsdln5Pea9A10efnOjuM7DQPMUIM+mfuESz0R5KaiMd/CZR1o=@vger.kernel.org X-Gm-Message-State: AOJu0Yy5EMYb2uLYrdt/td/Tfof5UM7NiWtqCfuSyEvuteYPdedpLGcO gm4lDLtjjWWCjfWD6W/bZx2M0QD6mTv7m3t72zdWHvAmeDRrUIxCUUxp X-Gm-Gg: Acq92OEfGWSQshghgdghnwdewKuEj+p9O45+57Qr6CGROz/jRpzMC5BdkpnC19YsFTZ gJMbRazRjedGbtgGNL7IFWvIUeRpqLR/HbGrm4FvyF/brzhve14DJ0BEmsFQP0Mh0WFFW0ytFRA lfLidJsXTsDifcO357IhunutJxkiXs1TurHOboOh7wq8N9WK8+XFqPnJIpaZo/6IZnX9fkcNrX4 j9zyvC12uD87jg2/eTYleG/PxYOKfn/O/HMzKOdtj9Ya5ccpjZ7y3tcWWQD4fYg1VD7Q3b5KDaA NdrJwzcUuyh4TR3jfFR/6JfHPicrxDTxwHW8OHaGB2EERXOccoN/c0EeLUzQLWrjPaeJdZC1evh 6H0UfkitobTg3VjjhjZs5KX51kekLwH/dCSkhlOte/QTt+Oq2wOCsiBaZQqbY2z1B+1G0tfo/eh udf57z60lOIdXCEx0Sm/3VQsXmGfVjIA== X-Received: by 2002:a05:6000:401e:b0:449:c5e2:a8b7 with SMTP id ffacd0b85a97d-45e5c59ec1cmr26953205f8f.30.1779154084539; Mon, 18 May 2026 18:28:04 -0700 (PDT) Received: from WindFlash.powerhub ([2a0a:ef40:f83:8501:800:cd4:5e2:9556]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45d9ed2f738sm40548683f8f.16.2026.05.18.18.28.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2026 18:28:03 -0700 (PDT) From: Leonardo Bras To: Jonathan Corbet , Shuah Khan , Leonardo Bras , Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , Waiman Long , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , Brendan Jackman , Johannes Weiner , Zi Yan , Harry Yoo , Hao Li , Christoph Lameter , David Rientjes , Roman Gushchin , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , Qi Zheng , Shakeel Butt , Axel Rasmussen , Yuanchu Xie , Wei Xu , "Borislav Petkov (AMD)" , Randy Dunlap , Feng Tang , Dapeng Mi , Kees Cook , Marco Elver , Jakub Kicinski , Li RongQing , Eric Biggers , "Paul E. McKenney" , Nathan Chancellor , Nicolas Schier , Miguel Ojeda , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Thomas Gleixner , Douglas Anderson , Gary Guo , Christian Brauner , Pasha Tatashin , Coiby Xu , Masahiro Yamada , Frederic Weisbecker Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rt-devel@lists.linux.dev, Marcelo Tosatti Subject: [PATCH v4 4/4] slub: apply new pw_queue_on() interface Date: Mon, 18 May 2026 22:27:50 -0300 Message-ID: <20260519012754.240804-5-leobras.c@gmail.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260519012754.240804-1-leobras.c@gmail.com> References: <20260519012754.240804-1-leobras.c@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=29095; i=leobras.c@gmail.com; h=from:subject; bh=8q9feMK42IfvEu2tWg1UBCdmj8KxKXE5X0orIY01H4k=; b=owGbwMvMwCX2pizjszvTwvWMp9WSGLK494R/udDBw8uy/TbzSS6JFZ0Mb99Gnn3dxnln8aP/W SFHImped5SyMIhxMciKKbLIPpq/iuf7lIwjV34sgJnDygQyhIGLUwAmwq7OyHDJu2QN67bJX6NK hJ2mJd3Ys5BpwvwkD81a/c6wF48FOvMYGb5+dqwVkmau9nJqYI91quYvSnU/Maf5b0lBm+oE7+/ rWAE= X-Developer-Key: i=leobras.c@gmail.com; a=openpgp; fpr=36E6C95AE0F111CC5B6F4D2E688C33F8A0C5B0C5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make use of the new pw_{un,}lock*() and pw_queue_on() interface to improve performance & latency. For functions that may be scheduled in a different cpu, replace local_{un,}lock*() by pw_{un,}lock*(), and replace schedule_work_on() by pw_queue_on(). The same happens for flush_work() and pw_flush(). This change requires allocation of pw_structs instead of a work_structs, and changing parameters of a few functions to include the cpu parameter. This should bring no relevant performance impact on non-PWLOCKS kernels: For functions that may be scheduled in a different cpu, the local_*lock's this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()). Signed-off-by: Leonardo Bras Signed-off-by: Marcelo Tosatti --- mm/slub.c | 142 +++++++++++++++++++++++++++--------------------------- 1 file changed, 72 insertions(+), 70 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 8f9004536729..a154d20e78f7 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -43,20 +43,21 @@ #include #include #include #include #include #include #include #include #include #include +#include #include =20 #include "internal.h" =20 /* * Lock order: * 0. cpu_hotplug_lock * 1. slab_mutex (Global Mutex) * 2a. kmem_cache->cpu_sheaves->lock (Local trylock) * 2b. barn->lock (Spinlock) @@ -122,21 +123,21 @@ * (Note that the total number of slabs is an atomic value that may be * modified without taking the list lock). * * The list_lock is a centralized lock and thus we avoid taking it as * much as possible. As long as SLUB does not have to handle partial * slabs, operations can continue without any centralized lock. * * For debug caches, all allocations are forced to go through a list_lock * protected region to serialize against concurrent validation. * - * cpu_sheaves->lock (local_trylock) + * cpu_sheaves->lock (pw_trylock) * * This lock protects fastpath operations on the percpu sheaves. On !RT = it * only disables preemption and does no atomic operations. As long as th= e main * or spare sheaf can handle the allocation or free, there is no other * overhead. * * barn->lock (spinlock) * * This lock protects the operations on per-NUMA-node barn. It can quick= ly * serve an empty or full sheaf if available, and avoid more expensive r= efill @@ -150,21 +151,21 @@ * cmpxchg_double this is done by a lockless update of slab's freelist a= nd * counters, otherwise slab_lock is taken. This only needs to take the * list_lock if it's a first free to a full slab, or when a slab becomes= empty * after the free. * * irq, preemption, migration considerations * * Interrupts are disabled as part of list_lock or barn lock operations,= or * around the slab_lock operation, in order to make the slab allocator s= afe * to use in the context of an irq. - * Preemption is disabled as part of local_trylock operations. + * Preemption is disabled as part of pw_trylock operations. * kmalloc_nolock() and kfree_nolock() are safe in NMI context but see * their limitations. * * SLUB assigns two object arrays called sheaves for caching allocations a= nd * frees on each cpu, with a NUMA node shared barn for balancing between c= pus. * Allocations and frees are primarily served from these sheaves. * * Slabs with free elements are kept on a partial list and during regular * operations no list for full slabs is used. If an object in a full slab = is * freed then the slab will show up again on the partial lists. @@ -411,21 +412,21 @@ struct slab_sheaf { bool pfmemalloc; }; }; struct kmem_cache *cache; unsigned int size; int node; /* only used for rcu_sheaf */ void *objects[]; }; =20 struct slub_percpu_sheaves { - local_trylock_t lock; + pw_trylock_t lock; struct slab_sheaf *main; /* never NULL when unlocked */ struct slab_sheaf *spare; /* empty or full, may be NULL */ struct slab_sheaf *rcu_free; /* for batching kfree_rcu() */ }; =20 /* * The slab lists for all objects. */ struct kmem_cache_node { spinlock_t list_lock; @@ -477,21 +478,21 @@ static nodemask_t slab_nodes; * Corresponds to N_ONLINE nodes. */ static nodemask_t slab_barn_nodes; =20 /* * Workqueue used for flushing cpu and kfree_rcu sheaves. */ static struct workqueue_struct *flushwq; =20 struct slub_flush_work { - struct work_struct work; + struct pw_struct pw; struct kmem_cache *s; bool skip; }; =20 static DEFINE_MUTEX(flush_lock); static DEFINE_PER_CPU(struct slub_flush_work, slub_flush); =20 /******************************************************************** * Core slab cache functions *******************************************************************/ @@ -2838,74 +2839,74 @@ static void __kmem_cache_free_bulk(struct kmem_cach= e *s, size_t size, void **p); * Free all objects from the main sheaf. In order to perform * __kmem_cache_free_bulk() outside of cpu_sheaves->lock, work in batches = where * object pointers are moved to a on-stack array under the lock. To bound = the * stack usage, limit each batch to PCS_BATCH_MAX. * * Must be called with s->cpu_sheaves->lock locked, returns with the lock * unlocked. * * Returns how many objects are remaining to be flushed */ -static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s) +static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s, int cpu) { struct slub_percpu_sheaves *pcs; unsigned int batch, remaining; void *objects[PCS_BATCH_MAX]; struct slab_sheaf *sheaf; =20 - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); - - pcs =3D this_cpu_ptr(s->cpu_sheaves); + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); sheaf =3D pcs->main; =20 batch =3D min(PCS_BATCH_MAX, sheaf->size); =20 sheaf->size -=3D batch; memcpy(objects, sheaf->objects + sheaf->size, batch * sizeof(void *)); =20 remaining =3D sheaf->size; =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock(&s->cpu_sheaves->lock, cpu); =20 __kmem_cache_free_bulk(s, batch, &objects[0]); =20 stat_add(s, SHEAF_FLUSH, batch); =20 return remaining; } =20 -static void sheaf_flush_main(struct kmem_cache *s) +static void sheaf_flush_main(struct kmem_cache *s, int cpu) { unsigned int remaining; =20 do { - local_lock(&s->cpu_sheaves->lock); + pw_lock(&s->cpu_sheaves->lock, cpu); =20 - remaining =3D __sheaf_flush_main_batch(s); + remaining =3D __sheaf_flush_main_batch(s, cpu); =20 } while (remaining); } =20 /* * Returns true if the main sheaf was at least partially flushed. */ static bool sheaf_try_flush_main(struct kmem_cache *s) { unsigned int remaining; bool ret =3D false; =20 do { - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) return ret; =20 ret =3D true; - remaining =3D __sheaf_flush_main_batch(s); + + pw_lockdep_assert_held(&s->cpu_sheaves->lock); + remaining =3D __sheaf_flush_main_batch(s, smp_processor_id()); =20 } while (remaining); =20 return ret; } =20 /* * Free all objects from a sheaf that's unused, i.e. not linked to any * cpu_sheaves, so we need no locking and batching. The locking is also not * necessary when flushing cpu's sheaves (both spare and main) during cpu @@ -2968,45 +2969,45 @@ static void rcu_free_sheaf_nobarn(struct rcu_head *= head) =20 /* * Caller needs to make sure migration is disabled in order to fully flush * single cpu's sheaves * * must not be called from an irq * * flushing operations are rare so let's keep it simple and flush to slabs * directly, skipping the barn */ -static void pcs_flush_all(struct kmem_cache *s) +static void pcs_flush_all(struct kmem_cache *s, int cpu) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *spare, *rcu_free; =20 - local_lock(&s->cpu_sheaves->lock); - pcs =3D this_cpu_ptr(s->cpu_sheaves); + pw_lock(&s->cpu_sheaves->lock, cpu); + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 spare =3D pcs->spare; pcs->spare =3D NULL; =20 rcu_free =3D pcs->rcu_free; pcs->rcu_free =3D NULL; =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock(&s->cpu_sheaves->lock, cpu); =20 if (spare) { sheaf_flush_unused(s, spare); free_empty_sheaf(s, spare); } =20 if (rcu_free) call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn); =20 - sheaf_flush_main(s); + sheaf_flush_main(s, cpu); } =20 static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu) { struct slub_percpu_sheaves *pcs; =20 pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 /* The cpu is not executing anymore so we don't need pcs->lock */ sheaf_flush_unused(s, pcs->main); @@ -3942,83 +3943,84 @@ static bool has_pcs_used(int cpu, struct kmem_cache= *s) =20 /* * Flush percpu sheaves * * Called from CPU work handler with migration disabled. */ static void flush_cpu_sheaves(struct work_struct *w) { struct kmem_cache *s; struct slub_flush_work *sfw; + int cpu =3D pw_get_cpu(w); =20 - sfw =3D container_of(w, struct slub_flush_work, work); - + sfw =3D &per_cpu(slub_flush, cpu); s =3D sfw->s; =20 if (cache_has_sheaves(s)) - pcs_flush_all(s); + pcs_flush_all(s, cpu); } =20 static void flush_all_cpus_locked(struct kmem_cache *s) { struct slub_flush_work *sfw; unsigned int cpu; =20 lockdep_assert_cpus_held(); mutex_lock(&flush_lock); =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); if (!has_pcs_used(cpu, s)) { sfw->skip =3D true; continue; } - INIT_WORK(&sfw->work, flush_cpu_sheaves); + INIT_PW(&sfw->pw, flush_cpu_sheaves, cpu); sfw->skip =3D false; sfw->s =3D s; - queue_work_on(cpu, flushwq, &sfw->work); + pw_queue_on(cpu, flushwq, &sfw->pw); } =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); if (sfw->skip) continue; - flush_work(&sfw->work); + pw_flush(&sfw->pw); } =20 mutex_unlock(&flush_lock); } =20 static void flush_all(struct kmem_cache *s) { cpus_read_lock(); flush_all_cpus_locked(s); cpus_read_unlock(); } =20 static void flush_rcu_sheaf(struct work_struct *w) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *rcu_free; struct slub_flush_work *sfw; struct kmem_cache *s; + int cpu =3D pw_get_cpu(w); =20 - sfw =3D container_of(w, struct slub_flush_work, work); + sfw =3D &per_cpu(slub_flush, cpu); s =3D sfw->s; =20 - local_lock(&s->cpu_sheaves->lock); - pcs =3D this_cpu_ptr(s->cpu_sheaves); + pw_lock(&s->cpu_sheaves->lock, cpu); + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 rcu_free =3D pcs->rcu_free; pcs->rcu_free =3D NULL; =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock(&s->cpu_sheaves->lock, cpu); =20 if (rcu_free) call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn); } =20 =20 /* needed for kvfree_rcu_barrier() */ void flush_rcu_sheaves_on_cache(struct kmem_cache *s) { struct slub_flush_work *sfw; @@ -4029,28 +4031,28 @@ void flush_rcu_sheaves_on_cache(struct kmem_cache *= s) for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); =20 /* * we don't check if rcu_free sheaf exists - racing * __kfree_rcu_sheaf() might have just removed it. * by executing flush_rcu_sheaf() on the cpu we make * sure the __kfree_rcu_sheaf() finished its call_rcu() */ =20 - INIT_WORK(&sfw->work, flush_rcu_sheaf); + INIT_PW(&sfw->pw, flush_rcu_sheaf, cpu); sfw->s =3D s; - queue_work_on(cpu, flushwq, &sfw->work); + pw_queue_on(cpu, flushwq, &sfw->pw); } =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); - flush_work(&sfw->work); + pw_flush(&sfw->pw); } =20 mutex_unlock(&flush_lock); } =20 void flush_all_rcu_sheaves(void) { struct kmem_cache *s; =20 cpus_read_lock(); @@ -4589,36 +4591,36 @@ bool slab_post_alloc_hook(struct kmem_cache *s, str= uct list_lru *lru, * unlocked. */ static struct slub_percpu_sheaves * __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves = *pcs, gfp_t gfp) { struct slab_sheaf *empty =3D NULL; struct slab_sheaf *full; struct node_barn *barn; bool allow_spin; =20 - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + pw_lockdep_assert_held(&s->cpu_sheaves->lock); =20 /* Bootstrap or debug cache, back off */ if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); return NULL; } =20 if (pcs->spare && pcs->spare->size > 0) { swap(pcs->main, pcs->spare); return pcs; } =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); return NULL; } =20 allow_spin =3D gfpflags_allow_spinning(gfp); =20 full =3D barn_replace_empty_sheaf(barn, pcs->main, allow_spin); =20 if (full) { stat(s, BARN_GET); pcs->main =3D full; @@ -4629,21 +4631,21 @@ __pcs_replace_empty_main(struct kmem_cache *s, stru= ct slub_percpu_sheaves *pcs, =20 if (allow_spin) { if (pcs->spare) { empty =3D pcs->spare; pcs->spare =3D NULL; } else { empty =3D barn_get_empty_sheaf(barn, true); } } =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); pcs =3D NULL; =20 if (!allow_spin) return NULL; =20 if (!empty) { empty =3D alloc_empty_sheaf(s, gfp); if (!empty) return NULL; } @@ -4655,21 +4657,21 @@ __pcs_replace_empty_main(struct kmem_cache *s, stru= ct slub_percpu_sheaves *pcs, */ sheaf_flush_unused(s, empty); free_empty_sheaf(s, empty); =20 return NULL; } =20 full =3D empty; empty =3D NULL; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) goto barn_put; pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 /* * If we put any empty or full sheaf to the barn below, it's due to * racing or being migrated to a different cpu. Breaching the barn's * sheaf limits should be thus rare enough so just ignore them to * simplify the recovery. */ =20 @@ -4733,121 +4735,121 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t = gfp, int node) =20 /* * We assume the percpu sheaves contain only local objects although it's * not completely guaranteed, so we verify later. */ if (unlikely(node_requested && node !=3D numa_mem_id())) { stat(s, ALLOC_NODE_MISMATCH); return NULL; } =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) return NULL; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (unlikely(pcs->main->size =3D=3D 0)) { pcs =3D __pcs_replace_empty_main(s, pcs, gfp); if (unlikely(!pcs)) return NULL; } =20 object =3D pcs->main->objects[pcs->main->size - 1]; =20 if (unlikely(node_requested)) { /* * Verify that the object was from the node we want. This could * be false because of cpu migration during an unlocked part of * the current allocation or previous freeing process. */ if (page_to_nid(virt_to_page(object)) !=3D node) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); stat(s, ALLOC_NODE_MISMATCH); return NULL; } } =20 pcs->main->size--; =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 stat(s, ALLOC_FASTPATH); =20 return object; } =20 static __fastpath_inline unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t s= ize, void **p) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *main; unsigned int allocated =3D 0; unsigned int batch; =20 next_batch: - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) return allocated; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (unlikely(pcs->main->size =3D=3D 0)) { =20 struct slab_sheaf *full; struct node_barn *barn; =20 if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); return allocated; } =20 if (pcs->spare && pcs->spare->size > 0) { swap(pcs->main, pcs->spare); goto do_alloc; } =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); return allocated; } =20 full =3D barn_replace_empty_sheaf(barn, pcs->main, gfpflags_allow_spinning(gfp)); =20 if (full) { stat(s, BARN_GET); pcs->main =3D full; goto do_alloc; } =20 stat(s, BARN_GET_FAIL); =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 /* * Once full sheaves in barn are depleted, let the bulk * allocation continue from slab pages, otherwise we would just * be copying arrays of pointers twice. */ return allocated; } =20 do_alloc: =20 main =3D pcs->main; batch =3D min(size, main->size); =20 main->size -=3D batch; memcpy(p, main->objects + main->size, batch * sizeof(void *)); =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 stat_add(s, ALLOC_FASTPATH, batch); =20 allocated +=3D batch; =20 if (batch < size) { p +=3D batch; size -=3D batch; goto next_batch; } @@ -5017,40 +5019,40 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_= t gfp, unsigned int size) &sheaf->objects[0])) { kfree(sheaf); return NULL; } =20 sheaf->size =3D size; =20 return sheaf; } =20 - local_lock(&s->cpu_sheaves->lock); + pw_lock_local(&s->cpu_sheaves->lock); pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (pcs->spare) { sheaf =3D pcs->spare; pcs->spare =3D NULL; stat(s, SHEAF_PREFILL_FAST); } else { barn =3D get_barn(s); =20 stat(s, SHEAF_PREFILL_SLOW); if (barn) sheaf =3D barn_get_full_or_empty_sheaf(barn); if (sheaf && sheaf->size) stat(s, BARN_GET); else stat(s, BARN_GET_FAIL); } =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 =20 if (!sheaf) sheaf =3D alloc_empty_sheaf(s, gfp); =20 if (sheaf) { sheaf->capacity =3D s->sheaf_capacity; sheaf->pfmemalloc =3D false; =20 if (sheaf->size < size && @@ -5080,31 +5082,31 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, = gfp_t gfp, struct slub_percpu_sheaves *pcs; struct node_barn *barn; =20 if (unlikely((sheaf->capacity !=3D s->sheaf_capacity) || sheaf->pfmemalloc)) { sheaf_flush_unused(s, sheaf); kfree(sheaf); return; } =20 - local_lock(&s->cpu_sheaves->lock); + pw_lock_local(&s->cpu_sheaves->lock); pcs =3D this_cpu_ptr(s->cpu_sheaves); barn =3D get_barn(s); =20 if (!pcs->spare) { pcs->spare =3D sheaf; sheaf =3D NULL; stat(s, SHEAF_RETURN_FAST); } =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 if (!sheaf) return; =20 stat(s, SHEAF_RETURN_SLOW); =20 /* * If the barn has too many full sheaves or we fail to refill the sheaf, * simply flush and free it. */ @@ -5627,21 +5629,21 @@ static void __slab_free(struct kmem_cache *s, struc= t slab *slab, * An alternative scenario that gets us here is when we fail * barn_replace_full_sheaf(), because there's no empty sheaf available in = the * barn, so we had to allocate it by alloc_empty_sheaf(). But because we s= aw the * limit on full sheaves was not exceeded, we assume it didn't change and = just * put the full sheaf there. */ static void __pcs_install_empty_sheaf(struct kmem_cache *s, struct slub_percpu_sheaves *pcs, struct slab_sheaf *empty, struct node_barn *barn) { - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + pw_lockdep_assert_held(&s->cpu_sheaves->lock); =20 /* This is what we expect to find if nobody interrupted us. */ if (likely(!pcs->spare)) { pcs->spare =3D pcs->main; pcs->main =3D empty; return; } =20 /* * Unlikely because if the main sheaf had space, we would have just @@ -5678,31 +5680,31 @@ static void __pcs_install_empty_sheaf(struct kmem_c= ache *s, */ static struct slub_percpu_sheaves * __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *= pcs, bool allow_spin) { struct slab_sheaf *empty; struct node_barn *barn; bool put_fail; =20 restart: - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + pw_lockdep_assert_held(&s->cpu_sheaves->lock); =20 /* Bootstrap or debug cache, back off */ if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); return NULL; } =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); return NULL; } =20 put_fail =3D false; =20 if (!pcs->spare) { empty =3D barn_get_empty_sheaf(barn, allow_spin); if (empty) { pcs->spare =3D pcs->main; pcs->main =3D empty; @@ -5725,107 +5727,107 @@ __pcs_replace_full_main(struct kmem_cache *s, str= uct slub_percpu_sheaves *pcs, } =20 /* sheaf_flush_unused() doesn't support !allow_spin */ if (PTR_ERR(empty) =3D=3D -E2BIG && allow_spin) { /* Since we got here, spare exists and is full */ struct slab_sheaf *to_flush =3D pcs->spare; =20 stat(s, BARN_PUT_FAIL); =20 pcs->spare =3D NULL; - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 sheaf_flush_unused(s, to_flush); empty =3D to_flush; goto got_empty; } =20 /* * We could not replace full sheaf because barn had no empty * sheaves. We can still allocate it and put the full sheaf in * __pcs_install_empty_sheaf(), but if we fail to allocate it, * make sure to count the fail. */ put_fail =3D true; =20 alloc_empty: - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 /* * alloc_empty_sheaf() doesn't support !allow_spin and it's * easier to fall back to freeing directly without sheaves * than add the support (and to sheaf_flush_unused() above) */ if (!allow_spin) return NULL; =20 empty =3D alloc_empty_sheaf(s, GFP_NOWAIT); if (empty) goto got_empty; =20 if (put_fail) stat(s, BARN_PUT_FAIL); =20 if (!sheaf_try_flush_main(s)) return NULL; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) return NULL; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 /* * we flushed the main sheaf so it should be empty now, * but in case we got preempted or migrated, we need to * check again */ if (pcs->main->size =3D=3D s->sheaf_capacity) goto restart; =20 return pcs; =20 got_empty: - if (!local_trylock(&s->cpu_sheaves->lock)) { + if (!pw_trylock_local(&s->cpu_sheaves->lock)) { barn_put_empty_sheaf(barn, empty); return NULL; } =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); __pcs_install_empty_sheaf(s, pcs, empty, barn); =20 return pcs; } =20 /* * Free an object to the percpu sheaves. * The object is expected to have passed slab_free_hook() already. */ static __fastpath_inline bool free_to_pcs(struct kmem_cache *s, void *object, bool allow_spin) { struct slub_percpu_sheaves *pcs; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) return false; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (unlikely(pcs->main->size =3D=3D s->sheaf_capacity)) { =20 pcs =3D __pcs_replace_full_main(s, pcs, allow_spin); if (unlikely(!pcs)) return false; } =20 pcs->main->objects[pcs->main->size++] =3D object; =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 stat(s, FREE_FASTPATH); =20 return true; } =20 static void rcu_free_sheaf(struct rcu_head *head) { struct slab_sheaf *sheaf; struct node_barn *barn =3D NULL; @@ -5898,63 +5900,63 @@ static DEFINE_WAIT_OVERRIDE_MAP(kfree_rcu_sheaf_map= , LD_WAIT_CONFIG); bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *rcu_sheaf; =20 if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT))) return false; =20 lock_map_acquire_try(&kfree_rcu_sheaf_map); =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) goto fail; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (unlikely(!pcs->rcu_free)) { =20 struct slab_sheaf *empty; struct node_barn *barn; =20 /* Bootstrap or debug cache, fall back */ if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); goto fail; } =20 if (pcs->spare && pcs->spare->size =3D=3D 0) { pcs->rcu_free =3D pcs->spare; pcs->spare =3D NULL; goto do_free; } =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); goto fail; } =20 empty =3D barn_get_empty_sheaf(barn, true); =20 if (empty) { pcs->rcu_free =3D empty; goto do_free; } =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 empty =3D alloc_empty_sheaf(s, GFP_NOWAIT); =20 if (!empty) goto fail; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) { + if (!pw_trylock_local(&s->cpu_sheaves->lock)) { barn_put_empty_sheaf(barn, empty); goto fail; } =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (unlikely(pcs->rcu_free)) barn_put_empty_sheaf(barn, empty); else pcs->rcu_free =3D empty; @@ -5971,27 +5973,27 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *= obj) rcu_sheaf->objects[rcu_sheaf->size++] =3D obj; =20 if (likely(rcu_sheaf->size < s->sheaf_capacity)) { rcu_sheaf =3D NULL; } else { pcs->rcu_free =3D NULL; rcu_sheaf->node =3D numa_node_id(); } =20 /* - * we flush before local_unlock to make sure a racing + * we flush before pw_unlock_local to make sure a racing * flush_all_rcu_sheaves() doesn't miss this sheaf */ if (rcu_sheaf) call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf); =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 stat(s, FREE_RCU_SHEAF); lock_map_release(&kfree_rcu_sheaf_map); return true; =20 fail: stat(s, FREE_RCU_SHEAF_FAIL); lock_map_release(&kfree_rcu_sheaf_map); return false; } @@ -6082,21 +6084,21 @@ static void free_to_pcs_bulk(struct kmem_cache *s, = size_t size, void **p) continue; } =20 i++; } =20 if (!size) goto flush_remote; =20 next_batch: - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!pw_trylock_local(&s->cpu_sheaves->lock)) goto fallback; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (likely(pcs->main->size < s->sheaf_capacity)) goto do_free; =20 barn =3D get_barn(s); if (!barn) goto no_empty; @@ -6125,37 +6127,37 @@ static void free_to_pcs_bulk(struct kmem_cache *s, = size_t size, void **p) stat(s, BARN_PUT); pcs->main =3D empty; =20 do_free: main =3D pcs->main; batch =3D min(size, s->sheaf_capacity - main->size); =20 memcpy(main->objects + main->size, p, batch * sizeof(void *)); main->size +=3D batch; =20 - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 stat_add(s, FREE_FASTPATH, batch); =20 if (batch < size) { p +=3D batch; size -=3D batch; goto next_batch; } =20 if (remote_nr) goto flush_remote; =20 return; =20 no_empty: - local_unlock(&s->cpu_sheaves->lock); + pw_unlock_local(&s->cpu_sheaves->lock); =20 /* * if we depleted all empty sheaves in the barn or there are too * many full sheaves, free the rest to slab pages */ fallback: __kmem_cache_free_bulk(s, size, p); stat_add(s, FREE_SLOWPATH, size); =20 flush_remote: @@ -7554,21 +7556,21 @@ static inline int alloc_kmem_cache_stats(struct kme= m_cache *s) static int init_percpu_sheaves(struct kmem_cache *s) { static struct slab_sheaf bootstrap_sheaf =3D {}; int cpu; =20 for_each_possible_cpu(cpu) { struct slub_percpu_sheaves *pcs; =20 pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 - local_trylock_init(&pcs->lock); + pw_trylock_init(&pcs->lock); =20 /* * Bootstrap sheaf has zero size so fast-path allocation fails. * It has also size =3D=3D s->sheaf_capacity, so fast-path free * fails. In the slow paths we recognize the situation by * checking s->sheaf_capacity. This allows fast paths to assume * s->cpu_sheaves and pcs->main always exists and are valid. * It's also safe to share the single static bootstrap_sheaf * with zero-sized objects array as it's never modified. * --=20 2.54.0