From nobody Tue Apr 7 06:31:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 738BAC433FE for ; Wed, 12 Oct 2022 13:34:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229689AbiJLNeG (ORCPT ); Wed, 12 Oct 2022 09:34:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229638AbiJLNd5 (ORCPT ); Wed, 12 Oct 2022 09:33:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3F45CF868 for ; Wed, 12 Oct 2022 06:33:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665581635; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k7o9ogdwYjPmHnjavjS1jzPX8BieWsdsKWNkb3se7Tg=; b=g0iD7UVVQMN/CLcSOjsdw2205Sh/BdvnuNpOK7h/hSzLPwByFlYIa48dJev+J7LIN+mMm4 cy/qDpo/ZeD1X9OCO3Eu+iJ7loRPdy87CKZhhSbwGS3n8CpPy3nBeu2FT7+95IGe1f1WHH gA5KLk/RcQUb+UkskNUhzy2FIlqEv3o= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-17-OMyTEN68OJeZZTXua95jEg-1; Wed, 12 Oct 2022 09:33:49 -0400 X-MC-Unique: OMyTEN68OJeZZTXua95jEg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 97D633817962; Wed, 12 Oct 2022 13:33:48 +0000 (UTC) Received: from llong.com (unknown [10.22.33.120]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30CD6112131E; Wed, 12 Oct 2022 13:33:48 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?UTF-8?q?Ting11=20Wang=20=E7=8E=8B=E5=A9=B7?= , Waiman Long Subject: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Date: Wed, 12 Oct 2022 09:33:32 -0400 Message-Id: <20221012133333.1265281-2-longman@redhat.com> In-Reply-To: <20221012133333.1265281-1-longman@redhat.com> References: <20221012133333.1265281-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A non-first waiter can potentially spin in the for loop of rwsem_down_write_slowpath() without sleeping but fail to acquire the lock even if the rwsem is free if the following sequence happens: Non-first waiter First waiter Lock holder ---------------- ------------ ----------- Acquire wait_lock rwsem_try_write_lock(): Set handoff bit if RT or wait too long Set waiter->handoff_set Release wait_lock Acquire wait_lock Inherit waiter->handoff_set Release wait_lock Clear owner Release lock if (waiter.handoff_set) { rwsem_spin_on_owner((); if (OWNER_NULL) goto trylock_again; } trylock_again: Acquire wait_lock rwsem_try_write_lock(): if (first->handoff_set && (waiter !=3D first)) return false; Release wait_lock It is especially problematic if the non-first waiter is an RT task and it is running on the same CPU as the first waiter as this can lead to live lock. Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consist= ent") Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 44873594de03..3839b38608da 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -636,6 +636,11 @@ static inline bool rwsem_try_write_lock(struct rw_sema= phore *sem, new =3D count; =20 if (count & RWSEM_LOCK_MASK) { + /* + * A waiter (first or not) can set the handoff bit + * if it is an RT task or wait in the wait queue + * for too long. + */ if (has_handoff || (!rt_task(waiter->task) && !time_after(jiffies, waiter->timeout))) return false; @@ -651,11 +656,13 @@ static inline bool rwsem_try_write_lock(struct rw_sem= aphore *sem, } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); =20 /* - * We have either acquired the lock with handoff bit cleared or - * set the handoff bit. + * We have either acquired the lock with handoff bit cleared or set + * the handoff bit. Only the first waiter can have its handoff_set + * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - waiter->handoff_set =3D true; + if (waiter =3D=3D first) + waiter->handoff_set =3D true; lockevent_inc(rwsem_wlock_handoff); return false; } --=20 2.31.1 From nobody Tue Apr 7 06:31:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 365D9C433FE for ; Wed, 12 Oct 2022 13:34:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229732AbiJLNeB (ORCPT ); Wed, 12 Oct 2022 09:34:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229620AbiJLNdy (ORCPT ); Wed, 12 Oct 2022 09:33:54 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C524CF1BD for ; Wed, 12 Oct 2022 06:33:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665581632; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MEvw/2l7N1BaFm5nkG7FVAudL+WwYTcW5bO+ziwTPB0=; b=M55eRm4DR7WW4NRp006b8RGI/LzYopNp6zKgTFTqgex1ouDroy26kWnKgZ9WN1dUeN3DC3 SmsRGsBu1qpkqEYhDPChKnWA87jQ/7IziWrBLqNYixKOjblqg43f0yhcjNbGwrSSy1MgzG BPA9vxfvoPmitVVMkFRu8IvgrNKk50g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-574-3lEp-pAUPES13LZSLs-n7A-1; Wed, 12 Oct 2022 09:33:49 -0400 X-MC-Unique: 3lEp-pAUPES13LZSLs-n7A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1416985A5B6; Wed, 12 Oct 2022 13:33:49 +0000 (UTC) Received: from llong.com (unknown [10.22.33.120]) by smtp.corp.redhat.com (Postfix) with ESMTP id A4A42112131E; Wed, 12 Oct 2022 13:33:48 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mukesh Ojha , =?UTF-8?q?Ting11=20Wang=20=E7=8E=8B=E5=A9=B7?= , Waiman Long Subject: [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer Date: Wed, 12 Oct 2022 09:33:33 -0400 Message-Id: <20221012133333.1265281-3-longman@redhat.com> In-Reply-To: <20221012133333.1265281-1-longman@redhat.com> References: <20221012133333.1265281-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") assumes that when the owner field is changed to NULL, the lock will become free soon. That assumption may not be correct especially if the handoff writer doing the spinning is a RT task which may preempt another task from completing its action of either freeing the rwsem or properly setting up owner. To prevent this live lock scenario, we have to limit the number of trylock attempts without sleeping. The current limit is now set to 8 to allow enough time for the other task to hopefully complete its action. By adding new lock events to track the number of NULL owner retries with handoff flag set before a successful trylock when running a 96 threads locking microbenchmark with equal number of readers and writers running on a 2-core 96-thread system for 15 seconds, the following stats are obtained. Note that none of locking threads are RT tasks. Retries of successful trylock Count ----------------------------- ----- 1 1738 2 19 3 11 4 2 5 1 6 1 7 1 8 0 X 1 The last row is the one failed attempt that needs more than 8 retries. So a retry count maximum of 8 should capture most of them if no RT task is in the mix. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spi= n on owner") Reported-by: Mukesh Ojha Signed-off-by: Waiman Long --- kernel/locking/rwsem.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 3839b38608da..12eb093328f2 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1123,6 +1123,7 @@ static struct rw_semaphore __sched * rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) { struct rwsem_waiter waiter; + int null_owner_retries; DEFINE_WAKE_Q(wake_q); =20 /* do optimistic spinning and steal lock if possible */ @@ -1164,7 +1165,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, i= nt state) set_current_state(state); trace_contention_begin(sem, LCB_F_WRITE); =20 - for (;;) { + for (null_owner_retries =3D 0;;) { if (rwsem_try_write_lock(sem, &waiter)) { /* rwsem_try_write_lock() implies ACQUIRE on success */ break; @@ -1190,8 +1191,21 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, = int state) owner_state =3D rwsem_spin_on_owner(sem); preempt_enable(); =20 - if (owner_state =3D=3D OWNER_NULL) + /* + * owner is NULL doesn't guarantee the lock is free. + * An incoming reader will temporarily increment the + * reader count without changing owner and the + * rwsem_try_write_lock() will fails if the reader + * is not able to decrement it in time. Allow 8 + * trylock attempts when hitting a NULL owner before + * going to sleep. + */ + if ((owner_state =3D=3D OWNER_NULL) && + (null_owner_retries < 8)) { + null_owner_retries++; goto trylock_again; + } + null_owner_retries =3D 0; } =20 schedule(); --=20 2.31.1