From nobody Wed Apr 15 22:20:35 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [158.69.130.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E25B37D139; Mon, 23 Feb 2026 20:44:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=158.69.130.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771879475; cv=none; b=iKZijN+hd3ncn1dNePbSdjWBXWMAT6iwG/+iWQRmgIAMnAcgPcoybUSyw+M75dXFi2TV4Rjz+SV6OwZV6ICp/akB/85qg1LRB7yRCC/RYfFSU6AZY9myJQHPLxs5VWveVZwdWEGJl+u9IphF+pYyXkJaKWhyXEm2I06rwBR0+2I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771879475; c=relaxed/simple; bh=DqI4lq4Q+Xxdt2zGX0kl4suNz2Nx4a2JgOoqQOi5nV4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uPSwphit0cOho9XoXUcUYNxvEDzhcEaQwqzN04V/SUdAZm1oJEgPmxBGBIU4Aspvy9v3zVpdFTM6GjeS0Eej4sKg7lIMk4S3YDP6G/Nl8M7dDyrN4OejKp9eF34tafPHlR/TkwoY+23rkoditnSyDSoK5YJOiLnUEwGOL2H6nOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=FKGyEXy8; arc=none smtp.client-ip=158.69.130.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="FKGyEXy8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=smtpout1; t=1771879461; bh=Lw/it6bAj2xPEmK7LaoGCfd2HEFZ8Ovptdjqm5QWOYw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FKGyEXy8cbRi4rQnl742qSEHLniHCHzIS3i9hpVLVlJSSj8GbML5iTBoeQqBrQvjh QWPqEtkmbm2UAYWyTk4Cb4JBUEAd3gw2Lz+xamF25/FGINu1zrMOC5wFIYR9+Qcbo3 TrFRxXfj9w3CsQ3RZsy80ZpAn2uy2enj7MTeOVz76u4PxnLuumRKS1YxZj2v1aKSJ/ dlPwTrWiq40IOnuHvcJLUv9S5/vN+ZjN8Z3yrwZUtlQCK0uftiKwgoC73OKcVX21bC gFSpTaEsNjErL3btCpmbW0AvjiHiMi4GR+R5mVvEfIvIJgJFm2DqrvMVGMnfogFDwW Ckg3qmuj2l18Q== Received: from thinkos.internal.efficios.com (mtl.efficios.com [216.120.195.104]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4fKXs11dcBzHyw; Mon, 23 Feb 2026 15:44:21 -0500 (EST) From: Mathieu Desnoyers To: Boqun Feng Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Peter Zijlstra , Alan Stern , John Stultz , Linus Torvalds , Andrew Morton , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: [RFC PATCH v5 1/2] hazptr: Implement Hazard Pointers Date: Mon, 23 Feb 2026 15:44:17 -0500 Message-Id: <20260223204418.1429025-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20260223204418.1429025-1-mathieu.desnoyers@efficios.com> References: <20260223204418.1429025-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This API provides existence guarantees of objects through Hazard Pointers [1] (hazptr). Its main benefit over RCU is that it allows fast reclaim of HP-protected pointers without needing to wait for a grace period. This implementation has 4 statically allocated hazard pointer slots per cpu for the fast path, and relies on a on-stack backup slot allocated by the hazard pointer user as fallback in case no per-cpu slot is available. It integrates with the scheduler to migrate per-CPU slots to the backup slot on context switch. This ensures that the per-CPU slots won't be used by blocked or preempted tasks holding on hazard pointers for a long time. References: [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for lock-free objects," in IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 Link: https://lpc.events/event/19/contributions/2082/ Link: https://lore.kernel.org/lkml/j3scdl5iymjlxavomgc6u5ndg3svhab6ga23dr36= o4f5mt333w@7xslvq6b6hmv/ Link: https://lpc.events/event/18/contributions/1731/ Signed-off-by: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Michael Ellerman Cc: Greg Kroah-Hartman Cc: Sebastian Andrzej Siewior Cc: "Paul E. McKenney" Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Alan Stern Cc: John Stultz Cc: Linus Torvalds Cc: Andrew Morton Cc: Frederic Weisbecker Cc: Joel Fernandes Cc: Josh Triplett Cc: Uladzislau Rezki Cc: Steven Rostedt Cc: Lai Jiangshan Cc: Zqiang Cc: Ingo Molnar Cc: Waiman Long Cc: Mark Rutland Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: maged.michael@gmail.com Cc: Mateusz Guzik Cc: Jonas Oberhauser Cc: rcu@vger.kernel.org Cc: linux-mm@kvack.org Cc: lkmm@lists.linux.dev --- Changes since v4: - Fold scheduler integration. - Actually set ctx slot to backup slot on context switch. - Remove CONFIG_PREEMPT_HAZPTR config option. - Use per-cpu ctx pointers for context switch slot tracking rather than per-task lists. This accelerates the hazptr acquire/release fast-path. - Guarantee scan forward progress with two-lists scheme. - Reimplement the hazptr acquire with a temporary wildcard to eliminate a dependency on the addr_p load, likely to cause a pipeline stall due to the needed memory barrier. This simplifies the algorithm, removes the need for pointer re-load + comparison, and is expected to be faster on some architectures. - Reduce number of percpu slots to 4, introduce a hazptr_slot_item struct to contain both the slot and ctx pointers. Reducing number of slots to 4 makes sure all the slot and ctx pointers fit in a single cache line. - Rebased on v7.0-rc1. Changes since v3: - Rename hazptr_retire to hazptr_release. - Remove domains. - Introduce "backup_slot" within hazptr context structure (on stack) to handle slot overflow. - Rename hazptr_try_protect to hazptr_acquire. - Preallocate 8 per-CPU slots, and rely on caller-provided backup slots (typically on stack) for out-of-slots situations. Changes since v2: - Address Peter Zijlstra's comments. - Address Paul E. McKenney's comments. Changes since v0: - Remove slot variable from hp_dereference_allocate(). --- include/linux/hazptr.h | 197 +++++++++++++++++++++++++++++++++ init/main.c | 2 + kernel/Makefile | 2 +- kernel/hazptr.c | 242 +++++++++++++++++++++++++++++++++++++++++ kernel/sched/core.c | 2 + 5 files changed, 444 insertions(+), 1 deletion(-) create mode 100644 include/linux/hazptr.h create mode 100644 kernel/hazptr.c diff --git a/include/linux/hazptr.h b/include/linux/hazptr.h new file mode 100644 index 000000000000..461f481a480b --- /dev/null +++ b/include/linux/hazptr.h @@ -0,0 +1,197 @@ +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers +// +// SPDX-License-Identifier: LGPL-2.1-or-later + +#ifndef _LINUX_HAZPTR_H +#define _LINUX_HAZPTR_H + +/* + * hazptr: Hazard Pointers + * + * This API provides existence guarantees of objects through hazard + * pointers. + * + * Its main benefit over RCU is that it allows fast reclaim of + * HP-protected pointers without needing to wait for a grace period. + * + * References: + * + * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for + * lock-free objects," in IEEE Transactions on Parallel and + * Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 + */ + +#include +#include +#include +#include + +/* 4 slots (each sizeof(hazptr_slot_item)) fit in a single 64-byte cache l= ine. */ +#define NR_HAZPTR_PERCPU_SLOTS 4 +#define HAZPTR_WILDCARD ((void *) 0x1UL) + +/* + * Hazard pointer slot. + */ +struct hazptr_slot { + void *addr; +}; + +struct hazptr_overflow_list; + +struct hazptr_backup_slot { + struct hlist_node overflow_node; + struct hazptr_slot slot; + /* Overflow list where the backup slot is added. */ + struct hazptr_overflow_list *overflow_list; +}; + +struct hazptr_ctx { + struct hazptr_slot *slot; + /* Backup slot in case all per-CPU slots are used. */ + struct hazptr_backup_slot backup_slot; + struct hlist_node preempt_node; +}; + +struct hazptr_slot_ctx { + struct hazptr_ctx *ctx; +}; + +struct hazptr_slot_item { + struct hazptr_slot slot; + struct hazptr_slot_ctx ctx; +}; + +struct hazptr_percpu_slots { + struct hazptr_slot_item items[NR_HAZPTR_PERCPU_SLOTS]; +} ____cacheline_aligned; + +DECLARE_PER_CPU(struct hazptr_percpu_slots, hazptr_percpu_slots); + +void *__hazptr_acquire(struct hazptr_ctx *ctx, void * const * addr_p); + +/* + * hazptr_synchronize: Wait until @addr is released from all slots. + * + * Wait to observe that each slot contains a value that differs from + * @addr before returning. + * Should be called from preemptible context. + */ +void hazptr_synchronize(void *addr); + +/* + * hazptr_chain_backup_slot: Chain backup slot into overflow list. + * + * Set backup slot address to @addr, and chain it into the overflow + * list. + */ +struct hazptr_slot *hazptr_chain_backup_slot(struct hazptr_ctx *ctx); + +/* + * hazptr_unchain_backup_slot: Unchain backup slot from overflow list. + */ +void hazptr_unchain_backup_slot(struct hazptr_ctx *ctx); + +static inline +bool hazptr_slot_is_backup(struct hazptr_ctx *ctx, struct hazptr_slot *slo= t) +{ + return slot =3D=3D &ctx->backup_slot.slot; +} + +static inline +void hazptr_note_context_switch(void) +{ + struct hazptr_percpu_slots *percpu_slots =3D this_cpu_ptr(&hazptr_percpu_= slots); + unsigned int idx; + + for (idx =3D 0; idx < NR_HAZPTR_PERCPU_SLOTS; idx++) { + struct hazptr_slot_item *item =3D &percpu_slots->items[idx]; + struct hazptr_slot *slot =3D &item->slot, *backup_slot; + struct hazptr_ctx *ctx; + + if (!slot->addr) + continue; + ctx =3D item->ctx.ctx; + backup_slot =3D hazptr_chain_backup_slot(ctx); + /* + * Move hazard pointer from the per-CPU slot to the + * backup slot. This requires hazard pointer + * synchronize to iterate on per-CPU slots with + * load-acquire before iterating on the overflow list. + */ + WRITE_ONCE(backup_slot->addr, slot->addr); + /* + * store-release orders store to backup slot addr before + * store to per-CPU slot addr. + */ + smp_store_release(&slot->addr, NULL); + /* Use the backup slot for context. */ + ctx->slot =3D backup_slot; + } +} + +/* + * hazptr_acquire: Load pointer at address and protect with hazard pointer. + * + * Load @addr_p, and protect the loaded pointer with hazard pointer. + * When using hazptr_acquire from interrupt handlers, the acquired slots + * need to be released before returning from the interrupt handler. + * + * Returns a non-NULL protected address if the loaded pointer is non-NULL. + * Returns NULL if the loaded pointer is NULL. + * + * On success the protected hazptr slot is stored in @ctx->slot. + */ +static inline +void *hazptr_acquire(struct hazptr_ctx *ctx, void * const *addr_p) +{ + struct hazptr_percpu_slots *percpu_slots; + struct hazptr_slot_item *slot_item; + struct hazptr_slot *slot; + void *addr; + + guard(preempt)(); + percpu_slots =3D this_cpu_ptr(&hazptr_percpu_slots); + slot_item =3D &percpu_slots->items[0]; + slot =3D &slot_item->slot; + if (unlikely(slot->addr)) + return __hazptr_acquire(ctx, addr_p); + WRITE_ONCE(slot->addr, HAZPTR_WILDCARD); /* Store B */ + + /* Memory ordering: Store B before Load A. */ + smp_mb(); + + /* + * Load @addr_p after storing wildcard to the hazard pointer slot. + */ + addr =3D READ_ONCE(*addr_p); /* Load A */ + + /* + * We don't care about ordering of Store C. It will simply + * replace the wildcard by a more specific address. If addr is + * NULL, we simply store NULL into the slot. + */ + WRITE_ONCE(slot->addr, addr); /* Store C */ + slot_item->ctx.ctx =3D ctx; + ctx->slot =3D slot; + return addr; +} + +/* Release the protected hazard pointer from @slot. */ +static inline +void hazptr_release(struct hazptr_ctx *ctx, void *addr) +{ + struct hazptr_slot *slot; + + if (!addr) + return; + guard(preempt)(); + slot =3D ctx->slot; + smp_store_release(&slot->addr, NULL); + if (unlikely(hazptr_slot_is_backup(ctx, slot))) + hazptr_unchain_backup_slot(ctx); +} + +void hazptr_init(void); + +#endif /* _LINUX_HAZPTR_H */ diff --git a/init/main.c b/init/main.c index 1cb395dd94e4..b66017629935 100644 --- a/init/main.c +++ b/init/main.c @@ -105,6 +105,7 @@ #include #include #include +#include #include =20 #include @@ -1101,6 +1102,7 @@ void start_kernel(void) workqueue_init_early(); =20 rcu_init(); + hazptr_init(); kvfree_rcu_init(); =20 /* Trace events are available after this */ diff --git a/kernel/Makefile b/kernel/Makefile index 6785982013dc..b7cef6e23038 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -7,7 +7,7 @@ obj-y =3D fork.o exec_domain.o panic.o \ cpu.o exit.o softirq.o resource.o \ sysctl.o capability.o ptrace.o user.o \ signal.o sys.o umh.o workqueue.o pid.o task_work.o \ - extable.o params.o \ + extable.o params.o hazptr.o \ kthread.o sys_ni.o nsproxy.o nstree.o nscommon.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o smpboot.o ucount.o regset.o ksyms_common.o diff --git a/kernel/hazptr.c b/kernel/hazptr.c new file mode 100644 index 000000000000..a63ac681cb85 --- /dev/null +++ b/kernel/hazptr.c @@ -0,0 +1,242 @@ +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers +// +// SPDX-License-Identifier: LGPL-2.1-or-later + +/* + * hazptr: Hazard Pointers + */ + +#include +#include +#include +#include +#include +#include + +struct hazptr_overflow_list { + raw_spinlock_t lock; /* Lock protecting overflow list and list generatio= n. */ + struct hlist_head head; /* Overflow list head. */ + uint64_t gen; /* Overflow list generation. */ +}; + +/* + * Flip between two lists to guarantee list scan forward progress even + * with frequent generation counter increments. The list additions are + * always done on a different list than the one used for scan. The scan + * successively iterates on both lists. Therefore, only list removals + * can cause the iteration to retry, and the number of removals is + * limited to the number of list elements. + */ +struct hazptr_overflow_list_flip { + struct mutex lock; /* Mutex protecting add_idx from concurrent updates. = */ + unsigned int add_idx; /* Index of current flip-list to add to. */ + struct hazptr_overflow_list array[2]; +}; + +static DEFINE_PER_CPU(struct hazptr_overflow_list_flip, percpu_overflow_li= st_flip); + +DEFINE_PER_CPU(struct hazptr_percpu_slots, hazptr_percpu_slots); +EXPORT_PER_CPU_SYMBOL_GPL(hazptr_percpu_slots); + +static +struct hazptr_slot *hazptr_get_free_percpu_slot(struct hazptr_ctx *ctx) +{ + struct hazptr_percpu_slots *percpu_slots =3D this_cpu_ptr(&hazptr_percpu_= slots); + unsigned int idx; + + for (idx =3D 0; idx < NR_HAZPTR_PERCPU_SLOTS; idx++) { + struct hazptr_slot_item *item =3D &percpu_slots->items[idx]; + struct hazptr_slot *slot =3D &item->slot; + + if (!slot->addr) { + item->ctx.ctx =3D ctx; + return slot; + } + } + /* All slots are in use. */ + return NULL; +} + +/* + * Hazard pointer acquire slow path. + * Called with preemption disabled. + */ +void *__hazptr_acquire(struct hazptr_ctx *ctx, void * const *addr_p) +{ + struct hazptr_slot *slot =3D hazptr_get_free_percpu_slot(ctx); + void *addr; + + /* + * If all the per-CPU slots are already in use, fallback + * to the backup slot. + */ + if (unlikely(!slot)) + slot =3D hazptr_chain_backup_slot(ctx); + WRITE_ONCE(slot->addr, HAZPTR_WILDCARD); /* Store B */ + + /* Memory ordering: Store B before Load A. */ + smp_mb(); + + /* + * Load @addr_p after storing wildcard to the hazard pointer slot. + */ + addr =3D READ_ONCE(*addr_p); /* Load A */ + + /* + * We don't care about ordering of Store C. It will simply + * replace the wildcard by a more specific address. If addr is + * NULL, we simply store NULL into the slot. + */ + WRITE_ONCE(slot->addr, addr); /* Store C */ + ctx->slot =3D slot; + if (!addr && hazptr_slot_is_backup(ctx, slot)) + hazptr_unchain_backup_slot(ctx); + return addr; +} +EXPORT_SYMBOL_GPL(__hazptr_acquire); + +/* + * Perform piecewise iteration on overflow list waiting until "addr" is + * not present. Raw spinlock is released and taken between each list + * item and busy loop iteration. The overflow list generation is checked + * each time the lock is taken to validate that the list has not changed + * before resuming iteration or busy wait. If the generation has + * changed, retry the entire list traversal. + */ +static +void hazptr_synchronize_overflow_list(struct hazptr_overflow_list *overflo= w_list, void *addr) +{ + struct hazptr_backup_slot *backup_slot; + uint64_t snapshot_gen; + unsigned long flags; + + raw_spin_lock_irqsave(&overflow_list->lock, flags); +retry: + snapshot_gen =3D overflow_list->gen; + hlist_for_each_entry(backup_slot, &overflow_list->head, overflow_node) { + /* Busy-wait if node is found. */ + for (;;) { + void *load_addr =3D smp_load_acquire(&backup_slot->slot.addr); /* Load = B */ + + if (load_addr !=3D addr && load_addr !=3D HAZPTR_WILDCARD) + break; + raw_spin_unlock_irqrestore(&overflow_list->lock, flags); + cpu_relax(); + raw_spin_lock_irqsave(&overflow_list->lock, flags); + if (overflow_list->gen !=3D snapshot_gen) + goto retry; + } + raw_spin_unlock_irqrestore(&overflow_list->lock, flags); + /* + * Release raw spinlock, validate generation after + * re-acquiring the lock. + */ + raw_spin_lock_irqsave(&overflow_list->lock, flags); + if (overflow_list->gen !=3D snapshot_gen) + goto retry; + } + raw_spin_unlock_irqrestore(&overflow_list->lock, flags); +} + +static +void hazptr_synchronize_cpu_slots(int cpu, void *addr) +{ + struct hazptr_percpu_slots *percpu_slots =3D per_cpu_ptr(&hazptr_percpu_s= lots, cpu); + unsigned int idx; + + for (idx =3D 0; idx < NR_HAZPTR_PERCPU_SLOTS; idx++) { + struct hazptr_slot_item *item =3D &percpu_slots->items[idx]; + + /* Busy-wait if node is found. */ + smp_cond_load_acquire(&item->slot.addr, VAL !=3D addr && VAL !=3D HAZPTR= _WILDCARD); /* Load B */ + } +} + +/* + * hazptr_synchronize: Wait until @addr is released from all slots. + * + * Wait to observe that each slot contains a value that differs from + * @addr before returning. + * Should be called from preemptible context. + */ +void hazptr_synchronize(void *addr) +{ + int cpu; + + /* + * Busy-wait should only be done from preemptible context. + */ + lockdep_assert_preemption_enabled(); + + /* + * Store A precedes hazptr_scan(): it unpublishes addr (sets it to + * NULL or to a different value), and thus hides it from hazard + * pointer readers. + */ + if (!addr) + return; + /* Memory ordering: Store A before Load B. */ + smp_mb(); + /* Scan all CPUs slots. */ + for_each_possible_cpu(cpu) { + struct hazptr_overflow_list_flip *overflow_list_flip =3D per_cpu_ptr(&pe= rcpu_overflow_list_flip, cpu); + unsigned int scan_idx; + + /* Scan CPU slots. */ + hazptr_synchronize_cpu_slots(cpu, addr); + + /* + * Scan backup slots in percpu overflow lists. + * Forward progress is guaranteed by scanning one list + * while new elements are added into the other list. + */ + guard(mutex)(&overflow_list_flip->lock); + scan_idx =3D overflow_list_flip->add_idx ^ 1; + hazptr_synchronize_overflow_list(&overflow_list_flip->array[scan_idx], a= ddr); + /* Flip current list. */ + WRITE_ONCE(overflow_list_flip->add_idx, scan_idx); + hazptr_synchronize_overflow_list(&overflow_list_flip->array[scan_idx ^ 1= ], addr); + } +} +EXPORT_SYMBOL_GPL(hazptr_synchronize); + +struct hazptr_slot *hazptr_chain_backup_slot(struct hazptr_ctx *ctx) +{ + struct hazptr_overflow_list_flip *overflow_list_flip =3D this_cpu_ptr(&pe= rcpu_overflow_list_flip); + unsigned int list_idx =3D READ_ONCE(overflow_list_flip->add_idx); + struct hazptr_overflow_list *overflow_list =3D &overflow_list_flip->array= [list_idx]; + struct hazptr_slot *slot =3D &ctx->backup_slot.slot; + + slot->addr =3D NULL; + guard(raw_spinlock_irqsave)(&overflow_list->lock); + overflow_list->gen++; + hlist_add_head(&ctx->backup_slot.overflow_node, &overflow_list->head); + ctx->backup_slot.overflow_list =3D overflow_list; + return slot; +} +EXPORT_SYMBOL_GPL(hazptr_chain_backup_slot); + +void hazptr_unchain_backup_slot(struct hazptr_ctx *ctx) +{ + struct hazptr_overflow_list *overflow_list =3D ctx->backup_slot.overflow_= list; + + guard(raw_spinlock_irqsave)(&overflow_list->lock); + overflow_list->gen++; + hlist_del(&ctx->backup_slot.overflow_node); +} +EXPORT_SYMBOL_GPL(hazptr_unchain_backup_slot); + +void __init hazptr_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct hazptr_overflow_list_flip *overflow_list_flip =3D per_cpu_ptr(&pe= rcpu_overflow_list_flip, cpu); + + mutex_init(&overflow_list_flip->lock); + for (int i =3D 0; i < 2; i++) { + raw_spin_lock_init(&overflow_list_flip->array[i].lock); + INIT_HLIST_HEAD(&overflow_list_flip->array[i].head); + } + } +} diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 759777694c78..b3e10be20329 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include #include @@ -6790,6 +6791,7 @@ static void __sched notrace __schedule(int sched_mode) local_irq_disable(); rcu_note_context_switch(preempt); migrate_disable_switch(rq, prev); + hazptr_note_context_switch(); =20 /* * Make sure that signal_pending_state()->signal_pending() below --=20 2.39.5 From nobody Wed Apr 15 22:20:35 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [158.69.130.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BCA9378D70; Mon, 23 Feb 2026 20:44:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=158.69.130.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771879473; cv=none; b=gBtWM9HEVVHGYCy2K/DH60qnEWTmOBzz8jjepoSdganMnboUQTd0v/VTzfJua3+8s0WFoCPrPKNlf5Lv+9B1aQsa74RSxdVMb7j8/OZb/XsAMR6+eS/3dP7DYlHyos6VMIoUpqCHZzkB+0PMHbEoiE1OHLbrtoMua3xKp0MYW2o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771879473; c=relaxed/simple; bh=KRLeC9uIYhTend5uWbzrq9B2XZ0mtJpwAALcypC0Bbc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XmgHUpQGT27h5L4R+wB2JOjSeJYqJEfTMw52u4xT9WsjGXprrEjK+o+eUSTrdWIoAj/Vlqt8GxjqtAWgJj27u07TufHZ/UoM7BtYyZtFMNVgmjqjfkiNWKPD0J40QnrFu6wZD3OWtAJ3oJchhROKkParHT1ANXrQ0mkmyC1flb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=aoGj60zs; arc=none smtp.client-ip=158.69.130.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="aoGj60zs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=smtpout1; t=1771879461; bh=KyTEEaoXAAyCQz7ZsgcVYW88Dh1Bs2/KiZ/NYa4dnJE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aoGj60zs4KFIYxAUrRWU6lEvNVCe3z+VyrASWTHF+XgH58FbnH/OrB8z+3vOzmvP8 JMsxujoo9dt8RcFHa5fxdRwn80Lc9s9Y93bCTENVz/WqCwliCOiQDS0bXovc2us1FV ncGIF3lsVuR2iLlwm2pya7s+JXnESfpkoeQo3DIoWcyQgHF+ZKJUM0DUae8Qz8Xxm1 XH2sQ0IA4rSLNY4uksDZ/2lh2aUOL4orhYx8pXWyy/rEfJBkZ5vequihw1xAFNy3WK 7RKqWoXx8Z4GXP6xaS36EzKG2f4eVdTVa3RmO8LNcOl69uV07hUyUEoEOTS6e05rDo +XtYiSzPTdszw== Received: from thinkos.internal.efficios.com (mtl.efficios.com [216.120.195.104]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4fKXs12rh7zHyx; Mon, 23 Feb 2026 15:44:21 -0500 (EST) From: Mathieu Desnoyers To: Boqun Feng Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Peter Zijlstra , Alan Stern , John Stultz , Linus Torvalds , Andrew Morton , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: [RFC PATCH v5 2/2] hazptr: Add refscale test Date: Mon, 23 Feb 2026 15:44:18 -0500 Message-Id: <20260223204418.1429025-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20260223204418.1429025-1-mathieu.desnoyers@efficios.com> References: <20260223204418.1429025-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the refscale test for hazptr to measure the reader side performance. Signed-off-by: Mathieu Desnoyers Co-developed-by: Boqun Feng --- kernel/rcu/refscale.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c index c158b6a947cd..7d64dfe78327 100644 --- a/kernel/rcu/refscale.c +++ b/kernel/rcu/refscale.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -1210,6 +1211,47 @@ static const struct ref_scale_ops typesafe_seqlock_o= ps =3D { .name =3D "typesafe_seqlock" }; =20 +static void ref_hazptr_read_section(const int nloops) +{ + static void *ref_hazptr_read_section_ptr =3D ref_hazptr_read_section; + int i; + + for (i =3D nloops; i >=3D 0; i--) { + struct hazptr_ctx ctx; + void *addr; + + addr =3D hazptr_acquire(&ctx, &ref_hazptr_read_section_ptr); + hazptr_release(&ctx, addr); + } +} + +static void ref_hazptr_delay_section(const int nloops, const int udl, cons= t int ndl) +{ + static void *ref_hazptr_delay_section_ptr =3D ref_hazptr_delay_section; + int i; + + for (i =3D nloops; i >=3D 0; i--) { + struct hazptr_ctx ctx; + void *addr; + + addr =3D hazptr_acquire(&ctx, &ref_hazptr_delay_section_ptr); + un_delay(udl, ndl); + hazptr_release(&ctx, addr); + } +} + +static bool ref_hazptr_init(void) +{ + return true; +} + +static const struct ref_scale_ops hazptr_ops =3D { + .init =3D ref_hazptr_init, + .readsection =3D ref_hazptr_read_section, + .delaysection =3D ref_hazptr_delay_section, + .name =3D "hazptr" +}; + static void rcu_scale_one_reader(void) { if (readdelay <=3D 0) @@ -1524,6 +1566,7 @@ ref_scale_init(void) &sched_clock_ops, &clock_ops, &jiffies_ops, &preempt_ops, &bh_ops, &irq_ops, &irqsave_ops, &typesafe_ref_ops, &typesafe_lock_ops, &typesafe_seqlock_ops, + &hazptr_ops, }; =20 if (!torture_init_begin(scale_type, verbose)) --=20 2.39.5