From nobody Wed Sep 10 06:16:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B7DAC61DA4 for ; Thu, 23 Feb 2023 12:34:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233981AbjBWMez (ORCPT ); Thu, 23 Feb 2023 07:34:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233776AbjBWMeq (ORCPT ); Thu, 23 Feb 2023 07:34:46 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D528B53290 for ; Thu, 23 Feb 2023 04:34:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=LzsVsRfjf9ETCxQYWcN5alvuRxG2QcrRoaSxyUFP6b4=; b=vNGWC2yr4wyedzFUDO+tO6MBUN hqRYWG2BgSOgIyoitdfgYWmMA1tkmD1fTfEA5TDp8XDTG+33y42NnZbJiF0erX8RcmQIekZ6bhGNB TJqrCteviQcOFeHNM0Jkt9zd/VzuVMbY+KszSAlyGDf5vtj4BxvNIM4ncwh3HA2yQUBKaw1H8fUpl iKuxF1mdxI8zSDNBVLbMNDKy3tLIEMkI8U04J1IYoeQalyBe+5YcQMJI5Glbvd2jrnbrMzV8uzSMO vgm/iAZEC+xo0FiEkSsHv//FfzRi5AlTQdFEF8PJUHFU92b+uZQ0OSur9AROhTYrM/WEepCWZZ6QD 6fUxRvqA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pVAo0-00EMIk-Ov; Thu, 23 Feb 2023 12:34:37 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id CD8BE300472; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id B37832C86FE05; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Message-ID: <20230223123319.367721619@infradead.org> User-Agent: quilt/0.66 Date: Thu, 23 Feb 2023 13:26:43 +0100 From: Peter Zijlstra To: longman@redhat.com, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, boqun.feng@gmail.com Subject: [PATCH 1/6] locking/rwsem: Minor code refactoring in rwsem_mark_wake() References: <20230223122642.491637862@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Waiman Long Rename "oldcount" to "count" as it is not always old count value. Also make some minor code refactoring to reduce indentation. There is no functional change. Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20230216210933.1169097-2-longman@redhat.com --- kernel/locking/rwsem.c | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -40,7 +40,7 @@ * * When the rwsem is reader-owned and a spinning writer has timed out, * the nonspinnable bit will be set to disable optimistic spinning. - + * * When a writer acquires a rwsem, it puts its task_struct pointer * into the owner field. It is cleared after an unlock. * @@ -413,7 +413,7 @@ static void rwsem_mark_wake(struct rw_se struct wake_q_head *wake_q) { struct rwsem_waiter *waiter, *tmp; - long oldcount, woken =3D 0, adjustment =3D 0; + long count, woken =3D 0, adjustment =3D 0; struct list_head wlist; =20 lockdep_assert_held(&sem->wait_lock); @@ -424,22 +424,23 @@ static void rwsem_mark_wake(struct rw_se */ waiter =3D rwsem_first_waiter(sem); =20 - if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { - if (wake_type =3D=3D RWSEM_WAKE_ANY) { - /* - * Mark writer at the front of the queue for wakeup. - * Until the task is actually later awoken later by - * the caller, other writers are able to steal it. - * Readers, on the other hand, will block as they - * will notice the queued writer. - */ - wake_q_add(wake_q, waiter->task); - lockevent_inc(rwsem_wake_writer); - } + if (waiter->type !=3D RWSEM_WAITING_FOR_WRITE) + goto wake_readers; =20 - return; + if (wake_type =3D=3D RWSEM_WAKE_ANY) { + /* + * Mark writer at the front of the queue for wakeup. + * Until the task is actually later awoken later by + * the caller, other writers are able to steal it. + * Readers, on the other hand, will block as they + * will notice the queued writer. + */ + wake_q_add(wake_q, waiter->task); + lockevent_inc(rwsem_wake_writer); } + return; =20 +wake_readers: /* * No reader wakeup if there are too many of them already. */ @@ -455,15 +456,15 @@ static void rwsem_mark_wake(struct rw_se struct task_struct *owner; =20 adjustment =3D RWSEM_READER_BIAS; - oldcount =3D atomic_long_fetch_add(adjustment, &sem->count); - if (unlikely(oldcount & RWSEM_WRITER_MASK)) { + count =3D atomic_long_fetch_add(adjustment, &sem->count); + if (unlikely(count & RWSEM_WRITER_MASK)) { /* * When we've been waiting "too" long (for writers * to give up the lock), request a HANDOFF to * force the issue. */ if (time_after(jiffies, waiter->timeout)) { - if (!(oldcount & RWSEM_FLAG_HANDOFF)) { + if (!(count & RWSEM_FLAG_HANDOFF)) { adjustment -=3D RWSEM_FLAG_HANDOFF; lockevent_inc(rwsem_rlock_handoff); } @@ -524,21 +525,21 @@ static void rwsem_mark_wake(struct rw_se adjustment =3D woken * RWSEM_READER_BIAS - adjustment; lockevent_cond_inc(rwsem_wake_reader, woken); =20 - oldcount =3D atomic_long_read(&sem->count); + count =3D atomic_long_read(&sem->count); if (list_empty(&sem->wait_list)) { /* * Combined with list_move_tail() above, this implies * rwsem_del_waiter(). */ adjustment -=3D RWSEM_FLAG_WAITERS; - if (oldcount & RWSEM_FLAG_HANDOFF) + if (count & RWSEM_FLAG_HANDOFF) adjustment -=3D RWSEM_FLAG_HANDOFF; } else if (woken) { /* * When we've woken a reader, we no longer need to force * writers to give up the lock and we can clear HANDOFF. */ - if (oldcount & RWSEM_FLAG_HANDOFF) + if (count & RWSEM_FLAG_HANDOFF) adjustment -=3D RWSEM_FLAG_HANDOFF; } =20 @@ -844,7 +845,6 @@ static bool rwsem_optimistic_spin(struct * Try to acquire the lock */ taken =3D rwsem_try_write_lock_unqueued(sem); - if (taken) break; From nobody Wed Sep 10 06:16:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0780C61DA4 for ; Thu, 23 Feb 2023 12:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233999AbjBWMfC (ORCPT ); Thu, 23 Feb 2023 07:35:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233896AbjBWMes (ORCPT ); Thu, 23 Feb 2023 07:34:48 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77745532AC for ; Thu, 23 Feb 2023 04:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=0TS8nXz+u1sbnPtO4cPWraWjlrBkSRepEXgG/CH9vjk=; b=mN5vdRVpmTtTwyuN2DIFDauxqj 1o3qoMLZfcsM6maEP16gO+DSlQfS5ptl7EPmIPsPda4yq4sG369icdDjiUuaUYF+QIocsyOzMyZ96 6dklfwYDxpDR8zBjS9UOctta8RTPWdySmOX3yEYMR95FsXskw1nMviCP+PMybek5DF/0X+IpyocRP EyJH3qRjjaqcU8T1POIFwSGm8JuQrGsS7bSkVdiEvvyFtlhgXi/QeKMLOwtL1fLj1sJkSj49eoVk7 Rq1JOT1cSqa/rkEd/K5MZ6L/4ug8N9I7xuy1skmNaJ7C5XLPffjET0T2Qhy5F0derdG4e41Uwu9TM wghnXc8w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pVAo0-00CuV7-0x; Thu, 23 Feb 2023 12:34:36 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D174E30067B; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id B84AE2C86FE03; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Message-ID: <20230223123319.428168721@infradead.org> User-Agent: quilt/0.66 Date: Thu, 23 Feb 2023 13:26:44 +0100 From: Peter Zijlstra To: longman@redhat.com, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, boqun.feng@gmail.com Subject: [PATCH 2/6] locking/rwsem: Enforce queueing when HANDOFF References: <20230223122642.491637862@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ensure that HANDOFF disables all spinning and stealing. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 9 +++++++++ 1 file changed, 9 insertions(+) --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -468,7 +468,12 @@ static void rwsem_mark_wake(struct rw_se adjustment -=3D RWSEM_FLAG_HANDOFF; lockevent_inc(rwsem_rlock_handoff); } + /* + * With HANDOFF set for reader, we must + * terminate all spinning. + */ waiter->handoff_set =3D true; + rwsem_set_nonspinnable(sem); } =20 atomic_long_add(-adjustment, &sem->count); @@ -755,6 +760,10 @@ rwsem_spin_on_owner(struct rw_semaphore =20 owner =3D rwsem_owner_flags(sem, &flags); state =3D rwsem_owner_state(owner, flags); + + if (owner =3D=3D current) + return OWNER_NONSPINNABLE; /* Handoff granted */ + if (state !=3D OWNER_WRITER) return state; From nobody Wed Sep 10 06:16:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5ED2EC64ED6 for ; Thu, 23 Feb 2023 12:35:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234239AbjBWMfI (ORCPT ); Thu, 23 Feb 2023 07:35:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233986AbjBWMeu (ORCPT ); Thu, 23 Feb 2023 07:34:50 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF7BA532AC for ; Thu, 23 Feb 2023 04:34:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=WTI53Y+SwQOGWbcqDI+7SQeJQd6XtF7SRs7V3H3R59I=; b=TetszCDz9EJ8a4xLSdyc4i4hap ZLd7QFIX83/47cKL0WnZXXiSmxg5drYS3v5oCf8UGuMZowDCSvxu+EnF9PqTHAJOcv1ywYQvVDxe1 bu7/YiJQkSz72yWjAW8n2zWyxZc2H6DgDnzg1D0OU9yVUH+ggoggtHEXOjoGtAushPYyBSlRYxa20 nEVBQVeDcS5HEH1jK0GL+Ff8QxrmsvzCNjHXEOtAZIPSPQRZFhs1vTj2YJW4WsEr11yxz1YFBS0P3 xhjhHhYmYlSB57HWvV9hsP844XD3oSWKTa9qNscKtHeO3VhQv9m6g0/v1x9n+REAJo2+8AX8t+kjV pYBjkTBw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pVAo0-00CuV8-0z; Thu, 23 Feb 2023 12:34:36 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D5484300750; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id BCD9B2C539A0B; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Message-ID: <20230223123319.487908155@infradead.org> User-Agent: quilt/0.66 Date: Thu, 23 Feb 2023 13:26:45 +0100 From: Peter Zijlstra To: longman@redhat.com, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, boqun.feng@gmail.com Subject: [PATCH 3/6] locking/rwsem: Rework writer wakeup References: <20230223122642.491637862@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently readers and writers have distinctly different wait/wake methods. For readers the ->count adjustment happens on the wakeup side, while for writers the ->count adjustment happens on the wait side. This asymmetry is unfortunate since the wake side has an additional guarantee -- specifically, the wake side has observed the unlocked state, and thus it can know that speculative READER_BIAS perbutations on ->count are just that, they will be undone. Additionally, unifying the wait/wake methods allows sharing code. As such, do a straight-forward transform of the writer wakeup into the wake side. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 253 ++++++++++++++++++++++----------------------= ----- 1 file changed, 115 insertions(+), 138 deletions(-) --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -107,7 +107,7 @@ * * There are three places where the lock handoff bit may be set or cleared. * 1) rwsem_mark_wake() for readers -- set, clear - * 2) rwsem_try_write_lock() for writers -- set, clear + * 2) rwsem_writer_wake() for writers -- set, clear * 3) rwsem_del_waiter() -- clear * * For all the above cases, wait_lock will be held. A writer must also @@ -377,7 +377,7 @@ rwsem_add_waiter(struct rw_semaphore *se /* * Remove a waiter from the wait_list and clear flags. * - * Both rwsem_mark_wake() and rwsem_try_write_lock() contain a full 'copy'= of + * Both rwsem_mark_wake() and rwsem_writer_wake() contain a full 'copy' of * this function. Modify with care. * * Return: true if wait_list isn't empty and false otherwise @@ -394,6 +394,100 @@ rwsem_del_waiter(struct rw_semaphore *se return false; } =20 +static inline void +rwsem_waiter_wake(struct rwsem_waiter *waiter, struct wake_q_head *wake_q) +{ + struct task_struct *tsk; + + tsk =3D waiter->task; + get_task_struct(tsk); + + /* + * Ensure calling get_task_struct() before setting the reader + * waiter to nil such that rwsem_down_read_slowpath() cannot + * race with do_exit() by always holding a reference count + * to the task to wakeup. + */ + smp_store_release(&waiter->task, NULL); + /* + * Ensure issuing the wakeup (either by us or someone else) + * after setting the reader waiter to nil. + */ + wake_q_add_safe(wake_q, tsk); +} + +/* + * This function must be called with the sem->wait_lock held to prevent + * race conditions between checking the rwsem wait list and setting the + * sem->count accordingly. + * + * Implies rwsem_del_waiter() on success. + */ +static void rwsem_writer_wake(struct rw_semaphore *sem, + struct rwsem_waiter *waiter, + struct wake_q_head *wake_q) +{ + struct rwsem_waiter *first =3D rwsem_first_waiter(sem); + long count, new; + + lockdep_assert_held(&sem->wait_lock); + + count =3D atomic_long_read(&sem->count); + do { + bool has_handoff =3D !!(count & RWSEM_FLAG_HANDOFF); + + if (has_handoff) { + /* + * Honor handoff bit and yield only when the first + * waiter is the one that set it. Otherwisee, we + * still try to acquire the rwsem. + */ + if (first->handoff_set && (waiter !=3D first)) + return; + } + + new =3D count; + + if (count & RWSEM_LOCK_MASK) { + /* + * A waiter (first or not) can set the handoff bit + * if it is an RT task or wait in the wait queue + * for too long. + */ + if (has_handoff || (!rt_task(waiter->task) && + !time_after(jiffies, waiter->timeout))) + return; + + new |=3D RWSEM_FLAG_HANDOFF; + } else { + new |=3D RWSEM_WRITER_LOCKED; + new &=3D ~RWSEM_FLAG_HANDOFF; + + if (list_is_singular(&sem->wait_list)) + new &=3D ~RWSEM_FLAG_WAITERS; + } + } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); + + /* + * We have either acquired the lock with handoff bit cleared or set + * the handoff bit. Only the first waiter can have its handoff_set + * set here to enable optimistic spinning in slowpath loop. + */ + if (new & RWSEM_FLAG_HANDOFF) { + first->handoff_set =3D true; + lockevent_inc(rwsem_wlock_handoff); + return; + } + + /* + * Have rwsem_writer_wake() fully imply rwsem_del_waiter() on + * success. + */ + list_del(&waiter->list); + rwsem_set_owner(sem); + rwsem_waiter_wake(waiter, wake_q); +} + /* * handle the lock release when processes blocked on it that can now run * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must @@ -424,23 +518,12 @@ static void rwsem_mark_wake(struct rw_se */ waiter =3D rwsem_first_waiter(sem); =20 - if (waiter->type !=3D RWSEM_WAITING_FOR_WRITE) - goto wake_readers; - - if (wake_type =3D=3D RWSEM_WAKE_ANY) { - /* - * Mark writer at the front of the queue for wakeup. - * Until the task is actually later awoken later by - * the caller, other writers are able to steal it. - * Readers, on the other hand, will block as they - * will notice the queued writer. - */ - wake_q_add(wake_q, waiter->task); - lockevent_inc(rwsem_wake_writer); + if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { + if (wake_type =3D=3D RWSEM_WAKE_ANY) + rwsem_writer_wake(sem, waiter, wake_q); + return; } - return; =20 -wake_readers: /* * No reader wakeup if there are too many of them already. */ @@ -547,25 +630,8 @@ static void rwsem_mark_wake(struct rw_se atomic_long_add(adjustment, &sem->count); =20 /* 2nd pass */ - list_for_each_entry_safe(waiter, tmp, &wlist, list) { - struct task_struct *tsk; - - tsk =3D waiter->task; - get_task_struct(tsk); - - /* - * Ensure calling get_task_struct() before setting the reader - * waiter to nil such that rwsem_down_read_slowpath() cannot - * race with do_exit() by always holding a reference count - * to the task to wakeup. - */ - smp_store_release(&waiter->task, NULL); - /* - * Ensure issuing the wakeup (either by us or someone else) - * after setting the reader waiter to nil. - */ - wake_q_add_safe(wake_q, tsk); - } + list_for_each_entry_safe(waiter, tmp, &wlist, list) + rwsem_waiter_wake(waiter, wake_q); } =20 /* @@ -596,77 +662,6 @@ rwsem_del_wake_waiter(struct rw_semaphor } =20 /* - * This function must be called with the sem->wait_lock held to prevent - * race conditions between checking the rwsem wait list and setting the - * sem->count accordingly. - * - * Implies rwsem_del_waiter() on success. - */ -static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, - struct rwsem_waiter *waiter) -{ - struct rwsem_waiter *first =3D rwsem_first_waiter(sem); - long count, new; - - lockdep_assert_held(&sem->wait_lock); - - count =3D atomic_long_read(&sem->count); - do { - bool has_handoff =3D !!(count & RWSEM_FLAG_HANDOFF); - - if (has_handoff) { - /* - * Honor handoff bit and yield only when the first - * waiter is the one that set it. Otherwisee, we - * still try to acquire the rwsem. - */ - if (first->handoff_set && (waiter !=3D first)) - return false; - } - - new =3D count; - - if (count & RWSEM_LOCK_MASK) { - /* - * A waiter (first or not) can set the handoff bit - * if it is an RT task or wait in the wait queue - * for too long. - */ - if (has_handoff || (!rt_task(waiter->task) && - !time_after(jiffies, waiter->timeout))) - return false; - - new |=3D RWSEM_FLAG_HANDOFF; - } else { - new |=3D RWSEM_WRITER_LOCKED; - new &=3D ~RWSEM_FLAG_HANDOFF; - - if (list_is_singular(&sem->wait_list)) - new &=3D ~RWSEM_FLAG_WAITERS; - } - } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); - - /* - * We have either acquired the lock with handoff bit cleared or set - * the handoff bit. Only the first waiter can have its handoff_set - * set here to enable optimistic spinning in slowpath loop. - */ - if (new & RWSEM_FLAG_HANDOFF) { - first->handoff_set =3D true; - lockevent_inc(rwsem_wlock_handoff); - return false; - } - - /* - * Have rwsem_try_write_lock() fully imply rwsem_del_waiter() on - * success. - */ - list_del(&waiter->list); - rwsem_set_owner(sem); - return true; -} - -/* * The rwsem_spin_on_owner() function returns the following 4 values * depending on the lock owner state. * OWNER_NULL : owner is currently NULL @@ -1072,7 +1067,7 @@ rwsem_down_read_slowpath(struct rw_semap for (;;) { set_current_state(state); if (!smp_load_acquire(&waiter.task)) { - /* Matches rwsem_mark_wake()'s smp_store_release(). */ + /* Matches rwsem_waiter_wake()'s smp_store_release(). */ break; } if (signal_pending_state(state, current)) { @@ -1143,54 +1138,36 @@ rwsem_down_write_slowpath(struct rw_sema } else { atomic_long_or(RWSEM_FLAG_WAITERS, &sem->count); } + raw_spin_unlock_irq(&sem->wait_lock); =20 /* wait until we successfully acquire the lock */ - set_current_state(state); trace_contention_begin(sem, LCB_F_WRITE); =20 for (;;) { - if (rwsem_try_write_lock(sem, &waiter)) { - /* rwsem_try_write_lock() implies ACQUIRE on success */ + set_current_state(state); + if (!smp_load_acquire(&waiter.task)) { + /* Matches rwsem_waiter_wake()'s smp_store_release(). */ break; } - - raw_spin_unlock_irq(&sem->wait_lock); - - if (signal_pending_state(state, current)) - goto out_nolock; - - /* - * After setting the handoff bit and failing to acquire - * the lock, attempt to spin on owner to accelerate lock - * transfer. If the previous owner is a on-cpu writer and it - * has just released the lock, OWNER_NULL will be returned. - * In this case, we attempt to acquire the lock again - * without sleeping. - */ - if (waiter.handoff_set) { - enum owner_state owner_state; - - owner_state =3D rwsem_spin_on_owner(sem); - if (owner_state =3D=3D OWNER_NULL) - goto trylock_again; + if (signal_pending_state(state, current)) { + raw_spin_lock_irq(&sem->wait_lock); + if (waiter.task) + goto out_nolock; + raw_spin_unlock_irq(&sem->wait_lock); + /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ + break; } - schedule_preempt_disabled(); lockevent_inc(rwsem_sleep_writer); - set_current_state(state); -trylock_again: - raw_spin_lock_irq(&sem->wait_lock); } __set_current_state(TASK_RUNNING); - raw_spin_unlock_irq(&sem->wait_lock); lockevent_inc(rwsem_wlock); trace_contention_end(sem, 0); return sem; =20 out_nolock: - __set_current_state(TASK_RUNNING); - raw_spin_lock_irq(&sem->wait_lock); rwsem_del_wake_waiter(sem, &waiter, &wake_q); + __set_current_state(TASK_RUNNING); lockevent_inc(rwsem_wlock_fail); trace_contention_end(sem, -EINTR); return ERR_PTR(-EINTR); From nobody Wed Sep 10 06:16:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02CD2C61DA4 for ; Thu, 23 Feb 2023 12:34:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233938AbjBWMew (ORCPT ); Thu, 23 Feb 2023 07:34:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233356AbjBWMeq (ORCPT ); Thu, 23 Feb 2023 07:34:46 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D40FA5328E for ; Thu, 23 Feb 2023 04:34:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=K2pbR4KcjJRNdFn1g/z74LmAo74CzFbj013dia1RVO0=; b=PRPzLnSuPGsvGIsJRSbHD0VRiV ihE23ztaDlrwM3vlsaRthWGH3RjdLuoNFUiF980Y3ooo6TNnxOvl5xY6brBTGdLcOdVpBxMFRi2D8 5Z446BNMSyDKBaGxjMJ4McatOq8txUJjFsOg1hSFuINTpcGi+87dMEq3DQ3LoQcf0o9KS0N7l6FG3 jdKa1GY+lobwSY+qrgfRQnmcwD2at3VcOklQIzIeGvAyTheDgbqVUThV/Pd8EIutPw19Git9e/erf bl1lGvd9/3xImJH8zF6dcajFRaXyg0WqMbbX2G/oihGsEBNc0GYtVR4+PTqqvyHrpo3CDJi9tbJfS 6IbWalKQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pVAo0-00EMIl-Ok; Thu, 23 Feb 2023 12:34:37 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D9F10300820; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id C0E2C2C86FE06; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Message-ID: <20230223123319.548254615@infradead.org> User-Agent: quilt/0.66 Date: Thu, 23 Feb 2023 13:26:46 +0100 From: Peter Zijlstra To: longman@redhat.com, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, boqun.feng@gmail.com Subject: [PATCH 4/6] locking/rwsem: Split out rwsem_reader_wake() References: <20230223122642.491637862@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To provide symmetry with rwsem_writer_wake(). Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 84 +++++++++++++++++++++++++++-----------------= ----- 1 file changed, 47 insertions(+), 37 deletions(-) --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -106,9 +106,9 @@ * atomic_long_cmpxchg() will be used to obtain writer lock. * * There are three places where the lock handoff bit may be set or cleared. - * 1) rwsem_mark_wake() for readers -- set, clear + * 1) rwsem_reader_wake() for readers -- set, clear * 2) rwsem_writer_wake() for writers -- set, clear - * 3) rwsem_del_waiter() -- clear + * 3) rwsem_del_waiter() -- clear * * For all the above cases, wait_lock will be held. A writer must also * be the first one in the wait_list to be eligible for setting the handoff @@ -377,8 +377,8 @@ rwsem_add_waiter(struct rw_semaphore *se /* * Remove a waiter from the wait_list and clear flags. * - * Both rwsem_mark_wake() and rwsem_writer_wake() contain a full 'copy' of - * this function. Modify with care. + * Both rwsem_{reader,writer}_wake() contain a full 'copy' of this functio= n. + * Modify with care. * * Return: true if wait_list isn't empty and false otherwise */ @@ -488,42 +488,15 @@ static void rwsem_writer_wake(struct rw_ rwsem_waiter_wake(waiter, wake_q); } =20 -/* - * handle the lock release when processes blocked on it that can now run - * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must - * have been set. - * - there must be someone on the queue - * - the wait_lock must be held by the caller - * - tasks are marked for wakeup, the caller must later invoke wake_up_q() - * to actually wakeup the blocked task(s) and drop the reference count, - * preferably when the wait_lock is released - * - woken process blocks are discarded from the list after having task ze= roed - * - writers are only marked woken if downgrading is false - * - * Implies rwsem_del_waiter() for all woken readers. - */ -static void rwsem_mark_wake(struct rw_semaphore *sem, - enum rwsem_wake_type wake_type, - struct wake_q_head *wake_q) +static void rwsem_reader_wake(struct rw_semaphore *sem, + enum rwsem_wake_type wake_type, + struct rwsem_waiter *waiter, + struct wake_q_head *wake_q) { - struct rwsem_waiter *waiter, *tmp; long count, woken =3D 0, adjustment =3D 0; + struct rwsem_waiter *tmp; struct list_head wlist; =20 - lockdep_assert_held(&sem->wait_lock); - - /* - * Take a peek at the queue head waiter such that we can determine - * the wakeup(s) to perform. - */ - waiter =3D rwsem_first_waiter(sem); - - if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { - if (wake_type =3D=3D RWSEM_WAKE_ANY) - rwsem_writer_wake(sem, waiter, wake_q); - return; - } - /* * No reader wakeup if there are too many of them already. */ @@ -635,6 +608,42 @@ static void rwsem_mark_wake(struct rw_se } =20 /* + * handle the lock release when processes blocked on it that can now run + * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must + * have been set. + * - there must be someone on the queue + * - the wait_lock must be held by the caller + * - tasks are marked for wakeup, the caller must later invoke wake_up_q() + * to actually wakeup the blocked task(s) and drop the reference count, + * preferably when the wait_lock is released + * - woken process blocks are discarded from the list after having task ze= roed + * - writers are only marked woken if downgrading is false + * + * Implies rwsem_del_waiter() for all woken waiters. + */ +static void rwsem_mark_wake(struct rw_semaphore *sem, + enum rwsem_wake_type wake_type, + struct wake_q_head *wake_q) +{ + struct rwsem_waiter *waiter; + + lockdep_assert_held(&sem->wait_lock); + + /* + * Take a peek at the queue head waiter such that we can determine + * the wakeup(s) to perform. + */ + waiter =3D rwsem_first_waiter(sem); + + if (waiter->type =3D=3D RWSEM_WAITING_FOR_WRITE) { + if (wake_type =3D=3D RWSEM_WAKE_ANY) + rwsem_writer_wake(sem, waiter, wake_q); + } else { + rwsem_reader_wake(sem, wake_type, waiter, wake_q); + } +} + +/* * Remove a waiter and try to wake up other waiters in the wait queue * This function is called from the out_nolock path of both the reader and * writer slowpaths with wait_lock held. It releases the wait_lock and @@ -1017,9 +1026,10 @@ rwsem_down_read_slowpath(struct rw_semap */ if ((rcnt =3D=3D 1) && (count & RWSEM_FLAG_WAITERS)) { raw_spin_lock_irq(&sem->wait_lock); - if (!list_empty(&sem->wait_list)) + if (!list_empty(&sem->wait_list)) { rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED, &wake_q); + } raw_spin_unlock_irq(&sem->wait_lock); wake_up_q(&wake_q); } From nobody Wed Sep 10 06:16:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB8C6C61DA4 for ; Thu, 23 Feb 2023 12:35:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234184AbjBWMfG (ORCPT ); Thu, 23 Feb 2023 07:35:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233926AbjBWMes (ORCPT ); Thu, 23 Feb 2023 07:34:48 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB217532B2 for ; Thu, 23 Feb 2023 04:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Yu4KpcpLybjLjkU3gNwo4HWX9o49wsMDX9ywKwgjG5w=; b=oEmKyGJAp3cZqg6dOxsAhotOOm V/RmkQGNhhpdFNV30VD5UkNxtCeFjGdGqC5wwbUw7WA2HPa+y8u2h64SooSNAn7xSjGkDnaHvM2CP abhJV5F33gmn0YSaE+JkK1xYIG40nuLQXXUX+LjG5aiMvcLI5Cbbq0mRS3yZbPCZfRS3PZX27f2VW A5ItRTO9mHUgrvi1cyMuAwftC0KCyXCfYRKbEzg39w1SvAzGaRkvmymfTO+di2+OQRk22kaM2txSv QdvyWcY9il9aaSodAEIhAotg51acDtP0il9EnDPPoXvSKPmLekTmAuR+0VNtpVwXeEXNWN+gvB+BE //e2JLnQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pVAo1-00CuVB-1h; Thu, 23 Feb 2023 12:34:40 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id E057A300C50; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id C4C442C86FE07; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Message-ID: <20230223123319.608133045@infradead.org> User-Agent: quilt/0.66 Date: Thu, 23 Feb 2023 13:26:47 +0100 From: Peter Zijlstra To: longman@redhat.com, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, boqun.feng@gmail.com Subject: [PATCH 5/6] locking/rwsem: Unify wait loop References: <20230223122642.491637862@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that the reader and writer wait loops are identical, share the code. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/rwsem.c | 117 +++++++++++++++++++-------------------------= ----- 1 file changed, 47 insertions(+), 70 deletions(-) --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -650,13 +650,11 @@ static void rwsem_mark_wake(struct rw_se * optionally wake up waiters before it returns. */ static inline void -rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waite= r, - struct wake_q_head *wake_q) +rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waite= r) __releases(&sem->wait_lock) { bool first =3D rwsem_first_waiter(sem) =3D=3D waiter; - - wake_q_init(wake_q); + DEFINE_WAKE_Q(wake_q); =20 /* * If the wait_list isn't empty and the waiter to be deleted is @@ -664,10 +662,10 @@ rwsem_del_wake_waiter(struct rw_semaphor * be eligible to acquire or spin on the lock. */ if (rwsem_del_waiter(sem, waiter) && first) - rwsem_mark_wake(sem, RWSEM_WAKE_ANY, wake_q); + rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q); raw_spin_unlock_irq(&sem->wait_lock); - if (!wake_q_empty(wake_q)) - wake_up_q(wake_q); + if (!wake_q_empty(&wake_q)) + wake_up_q(&wake_q); } =20 /* @@ -993,6 +991,46 @@ static inline void rwsem_cond_wake_waite rwsem_mark_wake(sem, wake_type, wake_q); } =20 +#define waiter_type(_waiter, _r, _w) \ + ((_waiter)->type =3D=3D RWSEM_WAITING_FOR_READ ? (_r) : (_w)) + +static __always_inline struct rw_semaphore * +rwsem_waiter_wait(struct rw_semaphore *sem, struct rwsem_waiter *waiter, i= nt state) +{ + trace_contention_begin(sem, waiter_type(waiter, LCB_F_READ, LCB_F_WRITE)); + + /* wait to be given the lock */ + for (;;) { + set_current_state(state); + if (!smp_load_acquire(&waiter->task)) { + /* Matches rwsem_waiter_wake()'s smp_store_release(). */ + break; + } + if (signal_pending_state(state, current)) { + raw_spin_lock_irq(&sem->wait_lock); + if (waiter->task) + goto out_nolock; + raw_spin_unlock_irq(&sem->wait_lock); + /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ + break; + } + schedule_preempt_disabled(); + lockevent_inc(waiter_type(waiter, rwsem_sleep_reader, rwsem_sleep_writer= )); + } + + __set_current_state(TASK_RUNNING); + lockevent_inc(waiter_type(waiter, rwsem_rlock, rwsem_wlock)); + trace_contention_end(sem, 0); + return sem; + +out_nolock: + rwsem_del_wake_waiter(sem, waiter); + __set_current_state(TASK_RUNNING); + lockevent_inc(waiter_type(waiter, rwsem_rlock_fail, rwsem_wlock_fail)); + trace_contention_end(sem, -EINTR); + return ERR_PTR(-EINTR); +} + /* * Wait for the read lock to be granted */ @@ -1071,38 +1109,7 @@ rwsem_down_read_slowpath(struct rw_semap if (!wake_q_empty(&wake_q)) wake_up_q(&wake_q); =20 - trace_contention_begin(sem, LCB_F_READ); - - /* wait to be given the lock */ - for (;;) { - set_current_state(state); - if (!smp_load_acquire(&waiter.task)) { - /* Matches rwsem_waiter_wake()'s smp_store_release(). */ - break; - } - if (signal_pending_state(state, current)) { - raw_spin_lock_irq(&sem->wait_lock); - if (waiter.task) - goto out_nolock; - raw_spin_unlock_irq(&sem->wait_lock); - /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ - break; - } - schedule_preempt_disabled(); - lockevent_inc(rwsem_sleep_reader); - } - - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_rlock); - trace_contention_end(sem, 0); - return sem; - -out_nolock: - rwsem_del_wake_waiter(sem, &waiter, &wake_q); - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_rlock_fail); - trace_contention_end(sem, -EINTR); - return ERR_PTR(-EINTR); + return rwsem_waiter_wait(sem, &waiter, state); } =20 /* @@ -1150,37 +1157,7 @@ rwsem_down_write_slowpath(struct rw_sema } raw_spin_unlock_irq(&sem->wait_lock); =20 - /* wait until we successfully acquire the lock */ - trace_contention_begin(sem, LCB_F_WRITE); - - for (;;) { - set_current_state(state); - if (!smp_load_acquire(&waiter.task)) { - /* Matches rwsem_waiter_wake()'s smp_store_release(). */ - break; - } - if (signal_pending_state(state, current)) { - raw_spin_lock_irq(&sem->wait_lock); - if (waiter.task) - goto out_nolock; - raw_spin_unlock_irq(&sem->wait_lock); - /* Ordered by sem->wait_lock against rwsem_mark_wake(). */ - break; - } - schedule_preempt_disabled(); - lockevent_inc(rwsem_sleep_writer); - } - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_wlock); - trace_contention_end(sem, 0); - return sem; - -out_nolock: - rwsem_del_wake_waiter(sem, &waiter, &wake_q); - __set_current_state(TASK_RUNNING); - lockevent_inc(rwsem_wlock_fail); - trace_contention_end(sem, -EINTR); - return ERR_PTR(-EINTR); + return rwsem_waiter_wait(sem, &waiter, state); } =20 /* From nobody Wed Sep 10 06:16:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45B19C61DA4 for ; Thu, 23 Feb 2023 12:35:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233993AbjBWMe7 (ORCPT ); Thu, 23 Feb 2023 07:34:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233824AbjBWMeq (ORCPT ); Thu, 23 Feb 2023 07:34:46 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D551F53291 for ; Thu, 23 Feb 2023 04:34:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=TEsGyLB0CNcBpX90uP9C3/IuHqwgLDCS4pkbIzi7bTs=; b=W+Yg1h01pDhHfxnGAyRcsmwgae TUshyVt3m63K6ldCz0ndwzLsb7y7tLV14Pywr392gBpULlSTcJVFhW540Y3SiHgeMmp1msaNSxNxL 23lUZfij3DeAxFGUX7WK0ceMqp5pkC+1ohtQE5GcrPATG1ZXU0a1We/Lmy/EhrRElzcnzM3jBMRCw UWWAB3pHtfdbwzU2ZRpWM+a50z2mkn48eAsvf3NXy7awZLRfV5NVj+tTBpTokPy5VHbm7uwNESw3u MV6tqDmGvzvM3onImHKxLJYIlk8ISJqq9No4CkqP3hHxlpLYImk+3NH+4GGL47TviJPiPV2avyFid h/0oSBmQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pVAo1-00EMIv-2D; Thu, 23 Feb 2023 12:34:37 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id DE32130082F; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id C88112C86FE08; Thu, 23 Feb 2023 13:34:35 +0100 (CET) Message-ID: <20230223123319.667433408@infradead.org> User-Agent: quilt/0.66 Date: Thu, 23 Feb 2023 13:26:48 +0100 From: Peter Zijlstra To: longman@redhat.com, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, boqun.feng@gmail.com Subject: [PATCH 6/6] locking/rwsem: Use the force References: <20230223122642.491637862@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that the writer adjustment is done from the wakeup side and HANDOFF guarantees spinning/stealing is disabled, use the combined guarantee it ignore spurious READER_BIAS and directly claim the lock. Signed-off-by: Peter Zijlstra (Intel) --- kernel/locking/lock_events_list.h | 1 + kernel/locking/rwsem.c | 21 +++++++++++++++++++++ 2 files changed, 22 insertions(+) --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -67,3 +67,4 @@ LOCK_EVENT(rwsem_rlock_handoff) /* # of LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */ LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */ LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */ +LOCK_EVENT(rwsem_wlock_ehandoff) /* # of write lock early handoffs */ --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -433,6 +433,26 @@ static void rwsem_writer_wake(struct rw_ lockdep_assert_held(&sem->wait_lock); =20 count =3D atomic_long_read(&sem->count); + + /* + * Since rwsem_mark_wake() is only called (with WAKE_ANY) when + * the lock is unlocked, and the HANDOFF bit guarantees that + * all spinning / stealing is disabled, it is posssible to + * unconditionally claim the lock -- any READER_BIAS will be + * temporary. + */ + if (count & RWSEM_FLAG_HANDOFF) { + unsigned long adjustment =3D RWSEM_WRITER_LOCKED - RWSEM_FLAG_HANDOFF; + + if (list_is_singular(&sem->wait_list)) + adjustment -=3D RWSEM_FLAG_WAITERS; + + atomic_long_set(&sem->owner, (long)waiter->task); + atomic_long_add(adjustment, &sem->count); + lockevent_inc(rwsem_wlock_ehandoff); + goto success; + } + do { bool has_handoff =3D !!(count & RWSEM_FLAG_HANDOFF); =20 @@ -479,6 +499,7 @@ static void rwsem_writer_wake(struct rw_ return; } =20 +success: /* * Have rwsem_writer_wake() fully imply rwsem_del_waiter() on * success.