From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64225C6FD1D for ; Mon, 27 Mar 2023 20:25:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232224AbjC0UZX (ORCPT ); Mon, 27 Mar 2023 16:25:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229946AbjC0UZO (ORCPT ); Mon, 27 Mar 2023 16:25:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8520E35BF for ; Mon, 27 Mar 2023 13:24:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948666; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GFELJIylOd6FzPoG34U6948e0UYJqjaUeBOiP9UwiQ4=; b=Ow0E5O2P3Pkez/Dp5it3kfMaBAOOWpNNDIEnYltUmQLOfg1oXdsRUk7Qg+1oez4XIw91Ia a0kRmPCACod6Tepd2su8pkrtwKDyO9N7pMIW0ll062rzMehqUMYBPOoJGgNTaWD6JH/tBC qTmCOxz9jkCPDrT+5TdhbSqsBUwOFic= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-557-t-BH6Cm-N6KWvahZmI8YrA-1; Mon, 27 Mar 2023 16:24:23 -0400 X-MC-Unique: t-BH6Cm-N6KWvahZmI8YrA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CCB4638123B2; Mon, 27 Mar 2023 20:24:22 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id 89521202701F; Mon, 27 Mar 2023 20:24:22 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, Waiman Long Subject: [PATCH v2 1/8] locking/rwsem: Minor code refactoring in rwsem_mark_wake() Date: Mon, 27 Mar 2023 16:24:06 -0400 Message-Id: <20230327202413.1955856-2-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Rename "oldcount" to "count" as it is not always old count value. Also make some minor code refactoring to reduce indentation. There is no functional change. Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20230216210933.1169097-2-longman@redhat.com --- kernel/locking/rwsem.c | 44 +++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index acb5a50309a1..e589f69793df 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -40,7 +40,7 @@ * * When the rwsem is reader-owned and a spinning writer has timed out, * the nonspinnable bit will be set to disable optimistic spinning. - + * * When a writer acquires a rwsem, it puts its task_struct pointer * into the owner field. It is cleared after an unlock. * @@ -413,7 +413,7 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, struct wake_q_head *wake_q) { struct rwsem_waiter *waiter, *tmp; - long oldcount, woken =3D 0, adjustment =3D 0; + long count, woken =3D 0, adjustment =3D 0; struct list_head wlist; =20 lockdep_assert_held(&sem->wait_lock); @@ -424,22 +424,23 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, */ waiter =3D rwsem_first_waiter(sem); =20 - if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { - if (wake_type =3D=3D RWSEM_WAKE_ANY) { - /* - * Mark writer at the front of the queue for wakeup. - * Until the task is actually later awoken later by - * the caller, other writers are able to steal it. - * Readers, on the other hand, will block as they - * will notice the queued writer. - */ - wake_q_add(wake_q, waiter->task); - lockevent_inc(rwsem_wake_writer); - } + if (waiter->type !=3D RWSEM_WAITING_FOR_WRITE) + goto wake_readers; =20 - return; + if (wake_type =3D=3D RWSEM_WAKE_ANY) { + /* + * Mark writer at the front of the queue for wakeup. + * Until the task is actually later awoken later by + * the caller, other writers are able to steal it. + * Readers, on the other hand, will block as they + * will notice the queued writer. + */ + wake_q_add(wake_q, waiter->task); + lockevent_inc(rwsem_wake_writer); } + return; =20 +wake_readers: /* * No reader wakeup if there are too many of them already. */ @@ -455,15 +456,15 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, struct task_struct *owner; =20 adjustment =3D RWSEM_READER_BIAS; - oldcount =3D atomic_long_fetch_add(adjustment, &sem->count); - if (unlikely(oldcount & RWSEM_WRITER_MASK)) { + count =3D atomic_long_fetch_add(adjustment, &sem->count); + if (unlikely(count & RWSEM_WRITER_MASK)) { /* * When we've been waiting "too" long (for writers * to give up the lock), request a HANDOFF to * force the issue. */ if (time_after(jiffies, waiter->timeout)) { - if (!(oldcount & RWSEM_FLAG_HANDOFF)) { + if (!(count & RWSEM_FLAG_HANDOFF)) { adjustment -=3D RWSEM_FLAG_HANDOFF; lockevent_inc(rwsem_rlock_handoff); } @@ -524,21 +525,21 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, adjustment =3D woken * RWSEM_READER_BIAS - adjustment; lockevent_cond_inc(rwsem_wake_reader, woken); =20 - oldcount =3D atomic_long_read(&sem->count); + count =3D atomic_long_read(&sem->count); if (list_empty(&sem->wait_list)) { /* * Combined with list_move_tail() above, this implies * rwsem_del_waiter(). */ adjustment -=3D RWSEM_FLAG_WAITERS; - if (oldcount & RWSEM_FLAG_HANDOFF) + if (count & RWSEM_FLAG_HANDOFF) adjustment -=3D RWSEM_FLAG_HANDOFF; } else if (woken) { /* * When we've woken a reader, we no longer need to force * writers to give up the lock and we can clear HANDOFF. */ - if (oldcount & RWSEM_FLAG_HANDOFF) + if (count & RWSEM_FLAG_HANDOFF) adjustment -=3D RWSEM_FLAG_HANDOFF; } =20 @@ -844,7 +845,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *= sem) * Try to acquire the lock */ taken =3D rwsem_try_write_lock_unqueued(sem); - if (taken) break; =20 --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BA72C76195 for ; Mon, 27 Mar 2023 20:25:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231978AbjC0UZU (ORCPT ); Mon, 27 Mar 2023 16:25:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229912AbjC0UZO (ORCPT ); Mon, 27 Mar 2023 16:25:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BD593AA5 for ; Mon, 27 Mar 2023 13:24:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OyPFo4v7z/4QuNDV6N3/A+RFNemmtx8czMnE3tFUcd8=; b=QEwqzNqsC6VevxoVx/OSO9gD3wF2487FA+EjAZzuQIsvYiooo0WMaHxE3+n62WBRNuEXwE KXA6DsIscPRP7vDUOeWE8LT5zQ7c6VYmNtqvEqB2UYGScXvnQl7l/1KI7Q2rE7NYv5ElrF blEEhOr/9ccPUInGN0WXDJo1DPaMckM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-626-Rithe-3aNKO8GnJWYN1xow-1; Mon, 27 Mar 2023 16:24:23 -0400 X-MC-Unique: Rithe-3aNKO8GnJWYN1xow-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4C5C6884EC0; Mon, 27 Mar 2023 20:24:23 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id D979B2027040; Mon, 27 Mar 2023 20:24:22 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 2/8] locking/rwsem: Enforce queueing when HANDOFF Date: Mon, 27 Mar 2023 16:24:07 -0400 Message-Id: <20230327202413.1955856-3-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Ensure that HANDOFF disables all spinning and stealing. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index e589f69793df..4b9e492abd59 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -468,7 +468,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, adjustment -=3D RWSEM_FLAG_HANDOFF; lockevent_inc(rwsem_rlock_handoff); } + /* + * With HANDOFF set for reader, we must + * terminate all spinning. + */ waiter->handoff_set =3D true; + rwsem_set_nonspinnable(sem); } =20 atomic_long_add(-adjustment, &sem->count); @@ -755,6 +760,10 @@ rwsem_spin_on_owner(struct rw_semaphore *sem) =20 owner =3D rwsem_owner_flags(sem, &flags); state =3D rwsem_owner_state(owner, flags); + + if (owner =3D=3D current) + return OWNER_NONSPINNABLE; /* Handoff granted */ + if (state !=3D OWNER_WRITER) return state; =20 --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16D99C76195 for ; Mon, 27 Mar 2023 20:25:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232345AbjC0UZ0 (ORCPT ); Mon, 27 Mar 2023 16:25:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230014AbjC0UZQ (ORCPT ); Mon, 27 Mar 2023 16:25:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A3C93ABA for ; Mon, 27 Mar 2023 13:24:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OlsCSIZsm4o5WvUVG0SHZMGS9aAGDqrQ6uJd+nDZAjw=; b=bdsTkUrMIU4/9goqbj5QFXLJPgjJskQgM2bPpcs6W1edeXwy1b4DJZRN/arGmydmaq0fSI eGhE79vux/wPq7Cwz/bHx1aEMc/bN+VfO22r7g17iLDVh41U5xgHyxkU4UI98a/eVPII3X 8mklZgXfY0kXvjURzyWjDGF6b7a7Mv8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-561-dWEAiqxvMMe040rQ_D1Mug-1; Mon, 27 Mar 2023 16:24:24 -0400 X-MC-Unique: dWEAiqxvMMe040rQ_D1Mug-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8F459811E7C; Mon, 27 Mar 2023 20:24:23 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id 591AE2027040; Mon, 27 Mar 2023 20:24:23 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 3/8] locking/rwsem: Rework writer wakeup Date: Mon, 27 Mar 2023 16:24:08 -0400 Message-Id: <20230327202413.1955856-4-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Currently readers and writers have distinctly different wait/wake methods. For readers the ->count adjustment happens on the wakeup side, while for writers the ->count adjustment happens on the wait side. This asymmetry is unfortunate since the wake side has an additional guarantee -- specifically, the wake side has observed the unlocked state, and thus it can know that speculative READER_BIAS perbutations on ->count are just that, they will be undone. Additionally, unifying the wait/wake methods allows sharing code. As such, do a straight-forward transform of the writer wakeup into the wake side. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 259 +++++++++++++++++++---------------------- 1 file changed, 123 insertions(+), 136 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 4b9e492abd59..0cc0aa566a6b 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -394,6 +394,108 @@ rwsem_del_waiter(struct rw_semaphore *sem, struct rws= em_waiter *waiter) return false; } =20 +static inline void +rwsem_waiter_wake(struct rwsem_waiter *waiter, struct wake_q_head *wake_q) +{ + struct task_struct *tsk; + + tsk =3D waiter->task; + get_task_struct(tsk); + + /* + * Ensure calling get_task_struct() before setting the reader + * waiter to nil such that rwsem_down_read_slowpath() cannot + * race with do_exit() by always holding a reference count + * to the task to wakeup. + */ + smp_store_release(&waiter->task, NULL); + /* + * Ensure issuing the wakeup (either by us or someone else) + * after setting the reader waiter to nil. + */ + wake_q_add_safe(wake_q, tsk); +} + +/* + * This function must be called with the sem->wait_lock held to prevent + * race conditions between checking the rwsem wait list and setting the + * sem->count accordingly. + * + * Implies rwsem_del_waiter() on success. + */ +static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, + struct rwsem_waiter *waiter) +{ + + struct rwsem_waiter *first =3D rwsem_first_waiter(sem); + long count, new; + + lockdep_assert_held(&sem->wait_lock); + + count =3D atomic_long_read(&sem->count); + do { + bool has_handoff =3D !!(count & RWSEM_FLAG_HANDOFF); + + if (has_handoff) { + /* + * Honor handoff bit and yield only when the first + * waiter is the one that set it. Otherwisee, we + * still try to acquire the rwsem. + */ + if (first->handoff_set && (waiter !=3D first)) + return false; + } + + new =3D count; + + if (count & RWSEM_LOCK_MASK) { + /* + * A waiter (first or not) can set the handoff bit + * if it is an RT task or wait in the wait queue + * for too long. + */ + if (has_handoff || (!rt_task(waiter->task) && + !time_after(jiffies, waiter->timeout))) + return false; + + new |=3D RWSEM_FLAG_HANDOFF; + } else { + new |=3D RWSEM_WRITER_LOCKED; + new &=3D ~RWSEM_FLAG_HANDOFF; + + if (list_is_singular(&sem->wait_list)) + new &=3D ~RWSEM_FLAG_WAITERS; + } + } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); + + /* + * We have either acquired the lock with handoff bit cleared or set + * the handoff bit. Only the first waiter can have its handoff_set + * set here to enable optimistic spinning in slowpath loop. + */ + if (new & RWSEM_FLAG_HANDOFF) { + first->handoff_set =3D true; + lockevent_inc(rwsem_wlock_handoff); + return false; + } + + /* + * Have rwsem_try_write_lock() fully imply rwsem_del_waiter() on + * success. + */ + list_del(&waiter->list); + atomic_long_set(&sem->owner, (long)waiter->task); + return true; +} + +static void rwsem_writer_wake(struct rw_semaphore *sem, + struct rwsem_waiter *waiter, + struct wake_q_head *wake_q) +{ + if (rwsem_try_write_lock(sem, waiter)) + rwsem_waiter_wake(waiter, wake_q); +} + /* * handle the lock release when processes blocked on it that can now run * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must @@ -424,23 +526,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, */ waiter =3D rwsem_first_waiter(sem); =20 - if (waiter->type !=3D RWSEM_WAITING_FOR_WRITE) - goto wake_readers; - - if (wake_type =3D=3D RWSEM_WAKE_ANY) { - /* - * Mark writer at the front of the queue for wakeup. - * Until the task is actually later awoken later by - * the caller, other writers are able to steal it. - * Readers, on the other hand, will block as they - * will notice the queued writer. - */ - wake_q_add(wake_q, waiter->task); - lockevent_inc(rwsem_wake_writer); + if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { + if (wake_type =3D=3D RWSEM_WAKE_ANY) + rwsem_writer_wake(sem, waiter, wake_q); + return; } - return; =20 -wake_readers: /* * No reader wakeup if there are too many of them already. */ @@ -552,25 +643,8 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, atomic_long_add(adjustment, &sem->count); =20 /* 2nd pass */ - list_for_each_entry_safe(waiter, tmp, &wlist, list) { - struct task_struct *tsk; - - tsk =3D waiter->task; - get_task_struct(tsk); - - /* - * Ensure calling get_task_struct() before setting the reader - * waiter to nil such that rwsem_down_read_slowpath() cannot - * race with do_exit() by always holding a reference count - * to the task to wakeup. - */ - smp_store_release(&waiter->task, NULL); - /* - * Ensure issuing the wakeup (either by us or someone else) - * after setting the reader waiter to nil. - */ - wake_q_add_safe(wake_q, tsk); - } + list_for_each_entry_safe(waiter, tmp, &wlist, list) + rwsem_waiter_wake(waiter, wake_q); } =20 /* @@ -600,77 +674,6 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct= rwsem_waiter *waiter, wake_up_q(wake_q); } =20 -/* - * This function must be called with the sem->wait_lock held to prevent - * race conditions between checking the rwsem wait list and setting the - * sem->count accordingly. - * - * Implies rwsem_del_waiter() on success. - */ -static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, - struct rwsem_waiter *waiter) -{ - struct rwsem_waiter *first =3D rwsem_first_waiter(sem); - long count, new; - - lockdep_assert_held(&sem->wait_lock); - - count =3D atomic_long_read(&sem->count); - do { - bool has_handoff =3D !!(count & RWSEM_FLAG_HANDOFF); - - if (has_handoff) { - /* - * Honor handoff bit and yield only when the first - * waiter is the one that set it. Otherwisee, we - * still try to acquire the rwsem. - */ - if (first->handoff_set && (waiter !=3D first)) - return false; - } - - new =3D count; - - if (count & RWSEM_LOCK_MASK) { - /* - * A waiter (first or not) can set the handoff bit - * if it is an RT task or wait in the wait queue - * for too long. - */ - if (has_handoff || (!rt_task(waiter->task) && - !time_after(jiffies, waiter->timeout))) - return false; - - new |=3D RWSEM_FLAG_HANDOFF; - } else { - new |=3D RWSEM_WRITER_LOCKED; - new &=3D ~RWSEM_FLAG_HANDOFF; - - if (list_is_singular(&sem->wait_list)) - new &=3D ~RWSEM_FLAG_WAITERS; - } - } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); - - /* - * We have either acquired the lock with handoff bit cleared or set - * the handoff bit. Only the first waiter can have its handoff_set - * set here to enable optimistic spinning in slowpath loop. - */ - if (new & RWSEM_FLAG_HANDOFF) { - first->handoff_set =3D true; - lockevent_inc(rwsem_wlock_handoff); - return false; - } - - /* - * Have rwsem_try_write_lock() fully imply rwsem_del_waiter() on - * success. - */ - list_del(&waiter->list); - rwsem_set_owner(sem); - return true; -} - /* * The rwsem_spin_on_owner() function returns the following 4 values * depending on the lock owner state. @@ -1081,7 +1084,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, lo= ng count, unsigned int stat for (;;) { set_current_state(state); if (!smp_load_acquire(&waiter.task)) { - /* Matches rwsem_mark_wake()'s smp_store_release(). */ + /* Matches rwsem_waiter_wake()'s smp_store_release(). */ break; } if (signal_pending_state(state, current)) { @@ -1151,55 +1154,39 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem,= int state) } } else { atomic_long_or(RWSEM_FLAG_WAITERS, &sem->count); + if (rwsem_try_write_lock(sem, &waiter)) + waiter.task =3D NULL; } + raw_spin_unlock_irq(&sem->wait_lock); =20 /* wait until we successfully acquire the lock */ - set_current_state(state); trace_contention_begin(sem, LCB_F_WRITE); =20 for (;;) { - if (rwsem_try_write_lock(sem, &waiter)) { - /* rwsem_try_write_lock() implies ACQUIRE on success */ + set_current_state(state); + if (!smp_load_acquire(&waiter.task)) { + /* Matches rwsem_waiter_wake()'s smp_store_release(). */ break; } - - raw_spin_unlock_irq(&sem->wait_lock); - - if (signal_pending_state(state, current)) - goto out_nolock; - - /* - * After setting the handoff bit and failing to acquire - * the lock, attempt to spin on owner to accelerate lock - * transfer. If the previous owner is a on-cpu writer and it - * has just released the lock, OWNER_NULL will be returned. - * In this case, we attempt to acquire the lock again - * without sleeping. - */ - if (waiter.handoff_set) { - enum owner_state owner_state; - - owner_state =3D rwsem_spin_on_owner(sem); - if (owner_state =3D=3D OWNER_NULL) - goto trylock_again; + if (signal_pending_state(state, current)) { + raw_spin_lock_irq(&sem->wait_lock); + if (waiter.task) + goto out_nolock; + raw_spin_unlock_irq(&sem->wait_lock); + /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ + break; } - schedule_preempt_disabled(); lockevent_inc(rwsem_sleep_writer); - set_current_state(state); -trylock_again: - raw_spin_lock_irq(&sem->wait_lock); } __set_current_state(TASK_RUNNING); - raw_spin_unlock_irq(&sem->wait_lock); lockevent_inc(rwsem_wlock); trace_contention_end(sem, 0); return sem; =20 out_nolock: - __set_current_state(TASK_RUNNING); - raw_spin_lock_irq(&sem->wait_lock); rwsem_del_wake_waiter(sem, &waiter, &wake_q); + __set_current_state(TASK_RUNNING); lockevent_inc(rwsem_wlock_fail); trace_contention_end(sem, -EINTR); return ERR_PTR(-EINTR); --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F247C6FD1D for ; Mon, 27 Mar 2023 20:26:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229952AbjC0U0I (ORCPT ); Mon, 27 Mar 2023 16:26:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232400AbjC0UZx (ORCPT ); Mon, 27 Mar 2023 16:25:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB1A43ABD for ; Mon, 27 Mar 2023 13:24:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uKqJMijs13Jukpid8KMMuGMpFDV4xKY3d7eMa9951QU=; b=CnHnHRnyr6Z8V+8gidl0UnwYDvhNfbSfmeacSWVy00QUWxGHGCk1J56XSwbHQ1IrxyeTp+ SC1sipXSgBVFe+KKkE8Pe4zz50zCDBRVOG9Rl3mt9pTuqT7OxtGxKeMnR1x/fd1zVjeAJ4 ZBQ3j/mhGaGFHpqCELkSl3sAOyka4P4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-445-uW3x49_LPuSP5n5_MrxFaw-1; Mon, 27 Mar 2023 16:24:24 -0400 X-MC-Unique: uW3x49_LPuSP5n5_MrxFaw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E156185A588; Mon, 27 Mar 2023 20:24:23 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9D2712027040; Mon, 27 Mar 2023 20:24:23 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, Waiman Long Subject: [PATCH v2 4/8] locking/rwsem: Simplify rwsem_writer_wake() Date: Mon, 27 Mar 2023 16:24:09 -0400 Message-Id: <20230327202413.1955856-5-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Since @waiter :=3D rwsem_first_waiter() simplify things. Suggested-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 0cc0aa566a6b..225e86326ea4 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -426,26 +426,12 @@ rwsem_waiter_wake(struct rwsem_waiter *waiter, struct= wake_q_head *wake_q) static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, struct rwsem_waiter *waiter) { - - struct rwsem_waiter *first =3D rwsem_first_waiter(sem); long count, new; =20 lockdep_assert_held(&sem->wait_lock); =20 count =3D atomic_long_read(&sem->count); do { - bool has_handoff =3D !!(count & RWSEM_FLAG_HANDOFF); - - if (has_handoff) { - /* - * Honor handoff bit and yield only when the first - * waiter is the one that set it. Otherwisee, we - * still try to acquire the rwsem. - */ - if (first->handoff_set && (waiter !=3D first)) - return false; - } - new =3D count; =20 if (count & RWSEM_LOCK_MASK) { @@ -454,8 +440,9 @@ static inline bool rwsem_try_write_lock(struct rw_semap= hore *sem, * if it is an RT task or wait in the wait queue * for too long. */ - if (has_handoff || (!rt_task(waiter->task) && - !time_after(jiffies, waiter->timeout))) + if ((count & RWSEM_FLAG_HANDOFF) || + (!rt_task(waiter->task) && + !time_after(jiffies, waiter->timeout))) return false; =20 new |=3D RWSEM_FLAG_HANDOFF; @@ -474,7 +461,7 @@ static inline bool rwsem_try_write_lock(struct rw_semap= hore *sem, * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - first->handoff_set =3D true; + waiter->handoff_set =3D true; lockevent_inc(rwsem_wlock_handoff); return false; } --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33A4DC77B60 for ; Mon, 27 Mar 2023 20:25:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230210AbjC0UZR (ORCPT ); Mon, 27 Mar 2023 16:25:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229595AbjC0UZN (ORCPT ); Mon, 27 Mar 2023 16:25:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D391835BD for ; Mon, 27 Mar 2023 13:24:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948666; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mu5ZzyYWSATBkbI4LlJA6D+0Mmfphw3RYNk8lrITOdU=; b=ZhbRkPnwLVzZfAqiSZ2OGAEZBygvWh4oPc0ylEpwEeTcSM+N6AtiexpMJSY8JObg3WrPiv RnbDssdPChiexyyFOhl/OQAh9lN0sIXHfbQwzQhg4bFJ0koFX9rke802pWS47zDF86mhWg c45I3n0ZKUPyiX37eAoZT3iK1QLCg7E= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-596-N9bmmKYXPvKrIs6WEbuAqA-1; Mon, 27 Mar 2023 16:24:24 -0400 X-MC-Unique: N9bmmKYXPvKrIs6WEbuAqA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 35075280BF61; Mon, 27 Mar 2023 20:24:24 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id ED6652027040; Mon, 27 Mar 2023 20:24:23 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 5/8] locking/rwsem: Split out rwsem_reader_wake() Date: Mon, 27 Mar 2023 16:24:10 -0400 Message-Id: <20230327202413.1955856-6-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra To provide symmetry with rwsem_writer_wake(). Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 83 +++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 37 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 225e86326ea4..0bc262dc77fd 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -106,9 +106,9 @@ * atomic_long_cmpxchg() will be used to obtain writer lock. * * There are three places where the lock handoff bit may be set or cleared. - * 1) rwsem_mark_wake() for readers -- set, clear + * 1) rwsem_reader_wake() for readers -- set, clear * 2) rwsem_try_write_lock() for writers -- set, clear - * 3) rwsem_del_waiter() -- clear + * 3) rwsem_del_waiter() -- clear * * For all the above cases, wait_lock will be held. A writer must also * be the first one in the wait_list to be eligible for setting the handoff @@ -377,7 +377,7 @@ rwsem_add_waiter(struct rw_semaphore *sem, struct rwsem= _waiter *waiter) /* * Remove a waiter from the wait_list and clear flags. * - * Both rwsem_mark_wake() and rwsem_try_write_lock() contain a full 'copy'= of + * Both rwsem_reader_wake() and rwsem_try_write_lock() contain a full 'cop= y' of * this function. Modify with care. * * Return: true if wait_list isn't empty and false otherwise @@ -483,42 +483,15 @@ static void rwsem_writer_wake(struct rw_semaphore *se= m, rwsem_waiter_wake(waiter, wake_q); } =20 -/* - * handle the lock release when processes blocked on it that can now run - * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must - * have been set. - * - there must be someone on the queue - * - the wait_lock must be held by the caller - * - tasks are marked for wakeup, the caller must later invoke wake_up_q() - * to actually wakeup the blocked task(s) and drop the reference count, - * preferably when the wait_lock is released - * - woken process blocks are discarded from the list after having task ze= roed - * - writers are only marked woken if downgrading is false - * - * Implies rwsem_del_waiter() for all woken readers. - */ -static void rwsem_mark_wake(struct rw_semaphore *sem, - enum rwsem_wake_type wake_type, - struct wake_q_head *wake_q) +static void rwsem_reader_wake(struct rw_semaphore *sem, + enum rwsem_wake_type wake_type, + struct rwsem_waiter *waiter, + struct wake_q_head *wake_q) { - struct rwsem_waiter *waiter, *tmp; long count, woken =3D 0, adjustment =3D 0; + struct rwsem_waiter *tmp; struct list_head wlist; =20 - lockdep_assert_held(&sem->wait_lock); - - /* - * Take a peek at the queue head waiter such that we can determine - * the wakeup(s) to perform. - */ - waiter =3D rwsem_first_waiter(sem); - - if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { - if (wake_type =3D=3D RWSEM_WAKE_ANY) - rwsem_writer_wake(sem, waiter, wake_q); - return; - } - /* * No reader wakeup if there are too many of them already. */ @@ -634,6 +607,42 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, rwsem_waiter_wake(waiter, wake_q); } =20 +/* + * handle the lock release when processes blocked on it that can now run + * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must + * have been set. + * - there must be someone on the queue + * - the wait_lock must be held by the caller + * - tasks are marked for wakeup, the caller must later invoke wake_up_q() + * to actually wakeup the blocked task(s) and drop the reference count, + * preferably when the wait_lock is released + * - woken process blocks are discarded from the list after having task ze= roed + * - writers are only marked woken if downgrading is false + * + * Implies rwsem_del_waiter() for all woken waiters. + */ +static void rwsem_mark_wake(struct rw_semaphore *sem, + enum rwsem_wake_type wake_type, + struct wake_q_head *wake_q) +{ + struct rwsem_waiter *waiter; + + lockdep_assert_held(&sem->wait_lock); + + /* + * Take a peek at the queue head waiter such that we can determine + * the wakeup(s) to perform. + */ + waiter =3D rwsem_first_waiter(sem); + + if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { + if (wake_type =3D=3D RWSEM_WAKE_ANY) + rwsem_writer_wake(sem, waiter, wake_q); + } else { + rwsem_reader_wake(sem, wake_type, waiter, wake_q); + } +} + /* * Remove a waiter and try to wake up other waiters in the wait queue * This function is called from the out_nolock path of both the reader and @@ -1022,8 +1031,8 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, lo= ng count, unsigned int stat if ((rcnt =3D=3D 1) && (count & RWSEM_FLAG_WAITERS)) { raw_spin_lock_irq(&sem->wait_lock); if (!list_empty(&sem->wait_list)) - rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED, - &wake_q); + rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED, &wake_q); + raw_spin_unlock_irq(&sem->wait_lock); wake_up_q(&wake_q); } --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33784C6FD1D for ; Mon, 27 Mar 2023 20:26:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232278AbjC0U0F (ORCPT ); Mon, 27 Mar 2023 16:26:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232139AbjC0UZx (ORCPT ); Mon, 27 Mar 2023 16:25:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BF0D3C02 for ; Mon, 27 Mar 2023 13:24:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gReWeVsxbG5vsOjLGeKXuMaVbUqU0YcjGeF+0CMOcxA=; b=PoLQMsxUuYoMheAUEunRG08KYU0zQLtVI/Arj86K9HYJbJUKitr0Fh0JziclW84v8ZwoyT F2ujqjRMk/o1kC7M3QpymV86EMOmhJgc02dZpIVzhsVSKCojYX+qIh/ghgMfkVDCQla+zz Ih320d0Z+rNGMaDthw3T0kV9AJrAvK8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-347-T0TqC2SmOWWrmoZMZTPLpw-1; Mon, 27 Mar 2023 16:24:24 -0400 X-MC-Unique: T0TqC2SmOWWrmoZMZTPLpw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 79EB7884EC4; Mon, 27 Mar 2023 20:24:24 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id 422B4202701E; Mon, 27 Mar 2023 20:24:24 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 6/8] locking/rwsem: Unify wait loop Date: Mon, 27 Mar 2023 16:24:11 -0400 Message-Id: <20230327202413.1955856-7-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Now that the reader and writer wait loops are identical, share the code. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 121 +++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 70 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 0bc262dc77fd..ee8861effcc2 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -650,13 +650,11 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, * optionally wake up waiters before it returns. */ static inline void -rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waite= r, - struct wake_q_head *wake_q) +rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waite= r) __releases(&sem->wait_lock) { bool first =3D rwsem_first_waiter(sem) =3D=3D waiter; - - wake_q_init(wake_q); + DEFINE_WAKE_Q(wake_q); =20 /* * If the wait_list isn't empty and the waiter to be deleted is @@ -664,10 +662,10 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struc= t rwsem_waiter *waiter, * be eligible to acquire or spin on the lock. */ if (rwsem_del_waiter(sem, waiter) && first) - rwsem_mark_wake(sem, RWSEM_WAKE_ANY, wake_q); + rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irq(&sem->wait_lock); - if (!wake_q_empty(wake_q)) - wake_up_q(wake_q); + if (!wake_q_empty(&wake_q)) + wake_up_q(&wake_q); } =20 /* @@ -997,6 +995,50 @@ static inline void rwsem_cond_wake_waiter(struct rw_se= maphore *sem, long count, rwsem_mark_wake(sem, wake_type, wake_q); } =20 +#define lockevent_rw_inc(rd, evr, evw) do { \ + lockevent_cond_inc(evr, (rd)); \ + lockevent_cond_inc(evw, !(rd)); \ +} while (0) + +static __always_inline struct rw_semaphore * +rwsem_waiter_wait(struct rw_semaphore *sem, struct rwsem_waiter *waiter, + int state, bool reader) +{ + trace_contention_begin(sem, reader ? LCB_F_READ : LCB_F_WRITE); + + /* wait to be given the lock */ + for (;;) { + set_current_state(state); + if (!smp_load_acquire(&waiter->task)) { + /* Matches rwsem_waiter_wake()'s smp_store_release(). */ + break; + } + if (signal_pending_state(state, current)) { + raw_spin_lock_irq(&sem->wait_lock); + if (waiter->task) + goto out_nolock; + raw_spin_unlock_irq(&sem->wait_lock); + /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ + break; + } + schedule_preempt_disabled(); + lockevent_rw_inc(reader, rwsem_sleep_reader, rwsem_sleep_writer); + } + + __set_current_state(TASK_RUNNING); + + lockevent_rw_inc(reader, rwsem_rlock, rwsem_wlock); + trace_contention_end(sem, 0); + return sem; + +out_nolock: + rwsem_del_wake_waiter(sem, waiter); + __set_current_state(TASK_RUNNING); + lockevent_rw_inc(reader, rwsem_rlock_fail, rwsem_wlock_fail); + trace_contention_end(sem, -EINTR); + return ERR_PTR(-EINTR); +} + /* * Wait for the read lock to be granted */ @@ -1074,38 +1116,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, l= ong count, unsigned int stat if (!wake_q_empty(&wake_q)) wake_up_q(&wake_q); =20 - trace_contention_begin(sem, LCB_F_READ); - - /* wait to be given the lock */ - for (;;) { - set_current_state(state); - if (!smp_load_acquire(&waiter.task)) { - /* Matches rwsem_waiter_wake()'s smp_store_release(). */ - break; - } - if (signal_pending_state(state, current)) { - raw_spin_lock_irq(&sem->wait_lock); - if (waiter.task) - goto out_nolock; - raw_spin_unlock_irq(&sem->wait_lock); - /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ - break; - } - schedule_preempt_disabled(); - lockevent_inc(rwsem_sleep_reader); - } - - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_rlock); - trace_contention_end(sem, 0); - return sem; - -out_nolock: - rwsem_del_wake_waiter(sem, &waiter, &wake_q); - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_rlock_fail); - trace_contention_end(sem, -EINTR); - return ERR_PTR(-EINTR); + return rwsem_waiter_wait(sem, &waiter, state, true); } =20 /* @@ -1155,37 +1166,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, = int state) } raw_spin_unlock_irq(&sem->wait_lock); =20 - /* wait until we successfully acquire the lock */ - trace_contention_begin(sem, LCB_F_WRITE); - - for (;;) { - set_current_state(state); - if (!smp_load_acquire(&waiter.task)) { - /* Matches rwsem_waiter_wake()'s smp_store_release(). */ - break; - } - if (signal_pending_state(state, current)) { - raw_spin_lock_irq(&sem->wait_lock); - if (waiter.task) - goto out_nolock; - raw_spin_unlock_irq(&sem->wait_lock); - /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ - break; - } - schedule_preempt_disabled(); - lockevent_inc(rwsem_sleep_writer); - } - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_wlock); - trace_contention_end(sem, 0); - return sem; - -out_nolock: - rwsem_del_wake_waiter(sem, &waiter, &wake_q); - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_wlock_fail); - trace_contention_end(sem, -EINTR); - return ERR_PTR(-EINTR); + return rwsem_waiter_wait(sem, &waiter, state, false); } =20 /* --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A9E3C6FD1D for ; Mon, 27 Mar 2023 20:26:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230379AbjC0U0M (ORCPT ); Mon, 27 Mar 2023 16:26:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229663AbjC0UZ4 (ORCPT ); Mon, 27 Mar 2023 16:25:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C0423C15 for ; Mon, 27 Mar 2023 13:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c/k5fdQzdPEiui8WNN3fZp6fkCe06zAbO2ZueqkSVSw=; b=flPTfa4Ro0wVSBUhjzFpt+HpWiPu4glOt3D+WzGQ0H6TUKRF83mZxRMxlkC+0+9vzoqfQ7 Uz17Y4+VkXRX7UE4c6WD7aS9vt+Ubf9sG61ivht35rKHdTvCOPa94CRDyRMBzNdNNizrH6 I3VjA+Pt+S0OjiEMz52aTSPnxwE+Js0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-595-XnRBMUz4NNubxjrYG_Xr8Q-1; Mon, 27 Mar 2023 16:24:25 -0400 X-MC-Unique: XnRBMUz4NNubxjrYG_Xr8Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BCFAE101A550; Mon, 27 Mar 2023 20:24:24 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id 867632027040; Mon, 27 Mar 2023 20:24:24 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 7/8] locking/rwsem: Use the force Date: Mon, 27 Mar 2023 16:24:12 -0400 Message-Id: <20230327202413.1955856-8-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Now that the writer adjustment is done from the wakeup side and HANDOFF guarantees spinning/stealing is disabled, use the combined guarantee it ignore spurious READER_BIAS and directly claim the lock. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index ee8861effcc2..7bd26e64827a 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -377,8 +377,8 @@ rwsem_add_waiter(struct rw_semaphore *sem, struct rwsem= _waiter *waiter) /* * Remove a waiter from the wait_list and clear flags. * - * Both rwsem_reader_wake() and rwsem_try_write_lock() contain a full 'cop= y' of - * this function. Modify with care. + * Both rwsem_{reader,writer}_wake() and rwsem_try_write_lock() contain a = full + * 'copy' of this function. Modify with care. * * Return: true if wait_list isn't empty and false otherwise */ @@ -479,8 +479,33 @@ static void rwsem_writer_wake(struct rw_semaphore *sem, struct rwsem_waiter *waiter, struct wake_q_head *wake_q) { - if (rwsem_try_write_lock(sem, waiter)) - rwsem_waiter_wake(waiter, wake_q); + long count =3D atomic_long_read(&sem->count); + + /* + * Since rwsem_mark_wake() is only called (with WAKE_ANY) when + * the lock is unlocked, and the HANDOFF bit guarantees that + * all spinning / stealing is disabled, it is posssible to + * unconditionally claim the lock -- any READER_BIAS will be + * temporary. + */ + if ((count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED)) =3D=3D RWSEM_FLAG_= HANDOFF) { + unsigned long adjustment =3D RWSEM_WRITER_LOCKED - RWSEM_FLAG_HANDOFF; + + if (list_is_singular(&sem->wait_list)) + adjustment -=3D RWSEM_FLAG_WAITERS; + + atomic_long_add(adjustment, &sem->count); + /* + * Have rwsem_writer_wake() fully imply rwsem_del_waiter() on + * success. + */ + list_del(&waiter->list); + atomic_long_set(&sem->owner, (long)waiter->task); + + } else if (!rwsem_try_write_lock(sem, waiter)) + return; + + rwsem_waiter_wake(waiter, wake_q); } =20 static void rwsem_reader_wake(struct rw_semaphore *sem, --=20 2.31.1 From nobody Wed Sep 10 03:09:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56056C76195 for ; Mon, 27 Mar 2023 20:26:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232678AbjC0U0P (ORCPT ); Mon, 27 Mar 2023 16:26:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232144AbjC0UZ4 (ORCPT ); Mon, 27 Mar 2023 16:25:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE7423C3B for ; Mon, 27 Mar 2023 13:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679948668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8r2GtgMj8HsgRbA0nyRJOwHfIjPVfs2+AeiQGsNXamk=; b=Bar92AYN8rhZQF+aV3M2Gdsmwpfm9M1XsD6luFWanHHbE82rFxUErGNHL/vxZaNc4xjGFw tOviGUUO3BwaEhKn14Ew+DrY5bVpVAnjIucio3BrJ61AZimMrXHH6fMxLVG++RgU7kgWyt 2EzwrNahSGK9Eq+nuxQw5k0bvN1uZsY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-453-NWXBHcTjMN-cOVwt0EF7XA-1; Mon, 27 Mar 2023 16:24:25 -0400 X-MC-Unique: NWXBHcTjMN-cOVwt0EF7XA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1D0A138123AB; Mon, 27 Mar 2023 20:24:25 +0000 (UTC) Received: from llong.com (unknown [10.22.17.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id CADCB2027040; Mon, 27 Mar 2023 20:24:24 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, Waiman Long Subject: [PATCH v2 8/8] locking/rwsem: Restore old write lock slow path behavior Date: Mon, 27 Mar 2023 16:24:13 -0400 Message-Id: <20230327202413.1955856-9-longman@redhat.com> In-Reply-To: <20230327202413.1955856-1-longman@redhat.com> References: <20230327202413.1955856-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" An earlier commit ("locking/rwsem: Rework writer wakeup") moves writer lock acquisition to the write wakeup path only. This can result in an almost immediate transfer of write lock ownership after an unlock leaving little time for lock stealing. That can be long before the new write lock owner wakes up and run its critical section. As a result, write lock stealing from optimistic spinning will be greatly suppressed. By enabling CONFIG_LOCK_EVENT_COUNTS and running a rwsem locking microbenmark on a 2-socket x86-64 test machine for 15s, it was found that the locking rate was reduced to about 30% of that before the patch - 584,091 op/s vs. 171,184 ops/s. The total number of lock stealings within the testing period was reduced to less than 1% of that before the patch - 4,252,147 vs 17,939 [1]. To restore the lost performance, this patch restores the old write lock slow path behavior of acquiring the lock after the waiter has been woken up with the exception that lock transfer may happen in the wakeup path if the HANDOFF bit has been set. In addition, the waiter that sets the HANDOFF bit will still try to spin on the lock owner if it is on cpu. With this patch, the locking rate is now back up to 580,256 ops/s which is almost the same as before. [1] https://lore.kernel.org/lkml/c126f079-88a2-4067-6f94-82f51cf5ff2b@redha= t.com / Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 7bd26e64827a..cf9dc1e250e0 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -426,6 +426,7 @@ rwsem_waiter_wake(struct rwsem_waiter *waiter, struct w= ake_q_head *wake_q) static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, struct rwsem_waiter *waiter) { + bool first =3D (rwsem_first_waiter(sem) =3D=3D waiter); long count, new; =20 lockdep_assert_held(&sem->wait_lock); @@ -434,6 +435,9 @@ static inline bool rwsem_try_write_lock(struct rw_semap= hore *sem, do { new =3D count; =20 + if (!first && (count & (RWSEM_FLAG_HANDOFF | RWSEM_LOCK_MASK))) + return false; + if (count & RWSEM_LOCK_MASK) { /* * A waiter (first or not) can set the handoff bit @@ -501,11 +505,18 @@ static void rwsem_writer_wake(struct rw_semaphore *se= m, */ list_del(&waiter->list); atomic_long_set(&sem->owner, (long)waiter->task); - - } else if (!rwsem_try_write_lock(sem, waiter)) + rwsem_waiter_wake(waiter, wake_q); return; + } =20 - rwsem_waiter_wake(waiter, wake_q); + /* + * Mark writer at the front of the queue for wakeup. + * Until the task is actually awoken later by the caller, other + * writers are able to steal it. Readers, on the other hand, will + * block as they will notice the queued writer. + */ + wake_q_add(wake_q, waiter->task); + lockevent_inc(rwsem_wake_writer); } =20 static void rwsem_reader_wake(struct rw_semaphore *sem, @@ -1038,6 +1049,23 @@ rwsem_waiter_wait(struct rw_semaphore *sem, struct r= wsem_waiter *waiter, /* Matches rwsem_waiter_wake()'s smp_store_release(). */ break; } + if (!reader) { + /* + * Writer still needs to do a trylock here + */ + raw_spin_lock_irq(&sem->wait_lock); + if (waiter->task && rwsem_try_write_lock(sem, waiter)) + waiter->task =3D NULL; + raw_spin_unlock_irq(&sem->wait_lock); + if (!smp_load_acquire(&waiter->task)) + break; + + if (waiter->handoff_set) { + rwsem_spin_on_owner(sem); + if (!smp_load_acquire(&waiter->task)) + break; + } + } if (signal_pending_state(state, current)) { raw_spin_lock_irq(&sem->wait_lock); if (waiter->task) --=20 2.31.1