[PATCH] locking/rwsem: Remove reader optimistic lock stealing

Peng Wang posted 1 patch 3 days, 10 hours ago
kernel/locking/lock_events_list.h |  1 -
kernel/locking/rwsem.c            | 33 ---------------------------------
2 files changed, 34 deletions(-)
[PATCH] locking/rwsem: Remove reader optimistic lock stealing
Posted by Peng Wang 3 days, 10 hours ago
Reader optimistic lock stealing, introduced by commit 1a728dff855a
("locking/rwsem: Enable reader optimistic lock stealing") and made more
aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
optimistic spinning"), allows a reader entering the slowpath to bypass
the wait queue and acquire the lock directly when WRITER_LOCKED and
HANDOFF bits are not set.

This causes severe writer starvation in workloads where readers hold
the lock for extended periods, such as Direct I/O operations which
hold inode->i_rwsem for the entire duration of iomap_dio_rw().  A
common example is log-structured storage where one thread appends via
DIO writes while another thread tails the log via DIO reads -- a
pattern seen in database redo-log replay and shared-storage
replication.

The problem occurs because:

1. A reader entering the slowpath (due to RWSEM_FLAG_WAITERS being set)
   can still steal the lock as the steal condition only checks
   WRITER_LOCKED and HANDOFF, not WAITERS. This is inconsistent with
   the fast path which already blocks readers when WAITERS is set (via
   RWSEM_READ_FAILED_MASK).

2. In the window between the last reader releasing the lock and the
   waiting writer being scheduled (~10-100us), a new reader can steal
   the lock in ~50-100ns. This race is structurally inevitable due to
   the 1000x speed difference between atomic operations vs context
   switching.

3. Each stolen read lock is held for the full DIO duration (potentially
   milliseconds), and the pattern repeats until the 4ms HANDOFF timeout.
   This effectively taxes every write operation with a ~4ms penalty.

Performance impact measured with a DIO mixed read/write workload
(1 writer + 1 reader, O_DIRECT, ext4):

  NVMe SSD:              Before          After
  Write-only baseline:   397 MB/s        397 MB/s (no change)
  Mixed write throughput: 11 MB/s        350 MB/s (+31x)
  Mixed write latency:   880 us          23 us   (-38x)
  Mixed read throughput:  95 MB/s         95 MB/s (no change)

Fixes: 617f3ef95177 ("locking/rwsem: Remove reader optimistic spinning")
Signed-off-by: Peng Wang <peng_wang@linux.alibaba.com>
---
 kernel/locking/lock_events_list.h |  1 -
 kernel/locking/rwsem.c            | 33 ---------------------------------
 2 files changed, 34 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..35b45576bee4 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -65,7 +65,6 @@ LOCK_EVENT(rwsem_opt_lock)	/* # of opt-acquired write locks	*/
 LOCK_EVENT(rwsem_opt_fail)	/* # of failed optspins			*/
 LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled optspins		*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
-LOCK_EVENT(rwsem_rlock_steal)	/* # of read locks by lock stealing	*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
 LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
 LOCK_EVENT(rwsem_rlock_handoff)	/* # of read lock handoffs		*/
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index bda5577339c0..40b141c5765f 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1017,42 +1017,9 @@ static struct rw_semaphore __sched *
 rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int state)
 {
 	long adjustment = -RWSEM_READER_BIAS;
-	long rcnt = (count >> RWSEM_READER_SHIFT);
 	struct rwsem_waiter waiter, *first;
 	DEFINE_WAKE_Q(wake_q);

-	/*
-	 * To prevent a constant stream of readers from starving a sleeping
-	 * writer, don't attempt optimistic lock stealing if the lock is
-	 * very likely owned by readers.
-	 */
-	if ((atomic_long_read(&sem->owner) & RWSEM_READER_OWNED) &&
-	    (rcnt > 1) && !(count & RWSEM_WRITER_LOCKED))
-		goto queue;
-
-	/*
-	 * Reader optimistic lock stealing.
-	 */
-	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) {
-		rwsem_set_reader_owned(sem);
-		lockevent_inc(rwsem_rlock_steal);
-
-		/*
-		 * Wake up other readers in the wait queue if it is
-		 * the first reader.
-		 */
-		if ((rcnt == 1) && (count & RWSEM_FLAG_WAITERS)) {
-			raw_spin_lock_irq(&sem->wait_lock);
-			if (sem->first_waiter)
-				rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED,
-						&wake_q);
-			raw_spin_unlock_irq(&sem->wait_lock);
-			wake_up_q(&wake_q);
-		}
-		return sem;
-	}
-
-queue:
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
--
2.39.3
Re: [PATCH] locking/rwsem: Remove reader optimistic lock stealing
Posted by Waiman Long 2 days, 18 hours ago
On 5/21/26 5:59 AM, Peng Wang wrote:
> Reader optimistic lock stealing, introduced by commit 1a728dff855a
> ("locking/rwsem: Enable reader optimistic lock stealing") and made more
> aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
> optimistic spinning"), allows a reader entering the slowpath to bypass
> the wait queue and acquire the lock directly when WRITER_LOCKED and
> HANDOFF bits are not set.
>
> This causes severe writer starvation in workloads where readers hold
> the lock for extended periods, such as Direct I/O operations which
> hold inode->i_rwsem for the entire duration of iomap_dio_rw().  A
> common example is log-structured storage where one thread appends via
> DIO writes while another thread tails the log via DIO reads -- a
> pattern seen in database redo-log replay and shared-storage
> replication.

It is generally assume that reader lock critical section is shorter than 
that of writer. In this particular case, does the reader critical 
section run longer than the writer's one?

Reader lock stealing should only happen if the previous lock owner is a 
writer. So readers and writer should at most alternately own the lock if 
there are many readers waiting. Of course, if a reader own the lock, it 
will wake up the remaining readers in the wait queue.

>
> The problem occurs because:
>
> 1. A reader entering the slowpath (due to RWSEM_FLAG_WAITERS being set)
>     can still steal the lock as the steal condition only checks
>     WRITER_LOCKED and HANDOFF, not WAITERS. This is inconsistent with
>     the fast path which already blocks readers when WAITERS is set (via
>     RWSEM_READ_FAILED_MASK).
>
> 2. In the window between the last reader releasing the lock and the
>     waiting writer being scheduled (~10-100us), a new reader can steal
>     the lock in ~50-100ns. This race is structurally inevitable due to
>     the 1000x speed difference between atomic operations vs context
>     switching.
>
> 3. Each stolen read lock is held for the full DIO duration (potentially
>     milliseconds), and the pattern repeats until the 4ms HANDOFF timeout.
>     This effectively taxes every write operation with a ~4ms penalty.
>
> Performance impact measured with a DIO mixed read/write workload
> (1 writer + 1 reader, O_DIRECT, ext4):
>
>    NVMe SSD:              Before          After
>    Write-only baseline:   397 MB/s        397 MB/s (no change)
>    Mixed write throughput: 11 MB/s        350 MB/s (+31x)
>    Mixed write latency:   880 us          23 us   (-38x)
>    Mixed read throughput:  95 MB/s         95 MB/s (no change)

Does the reader wait for something else and taking other locks after 
acquiring the read lock? Can you point me of the reader critical section 
for this particular workload?

Cheers,
Longman

>
> Fixes: 617f3ef95177 ("locking/rwsem: Remove reader optimistic spinning")
> Signed-off-by: Peng Wang <peng_wang@linux.alibaba.com>
> ---
>   kernel/locking/lock_events_list.h |  1 -
>   kernel/locking/rwsem.c            | 33 ---------------------------------
>   2 files changed, 34 deletions(-)
>
> diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
> index 97fb6f3f840a..35b45576bee4 100644
> --- a/kernel/locking/lock_events_list.h
> +++ b/kernel/locking/lock_events_list.h
> @@ -65,7 +65,6 @@ LOCK_EVENT(rwsem_opt_lock)	/* # of opt-acquired write locks	*/
>   LOCK_EVENT(rwsem_opt_fail)	/* # of failed optspins			*/
>   LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled optspins		*/
>   LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
> -LOCK_EVENT(rwsem_rlock_steal)	/* # of read locks by lock stealing	*/
>   LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
>   LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
>   LOCK_EVENT(rwsem_rlock_handoff)	/* # of read lock handoffs		*/
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index bda5577339c0..40b141c5765f 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1017,42 +1017,9 @@ static struct rw_semaphore __sched *
>   rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int state)
>   {
>   	long adjustment = -RWSEM_READER_BIAS;
> -	long rcnt = (count >> RWSEM_READER_SHIFT);
>   	struct rwsem_waiter waiter, *first;
>   	DEFINE_WAKE_Q(wake_q);
>
> -	/*
> -	 * To prevent a constant stream of readers from starving a sleeping
> -	 * writer, don't attempt optimistic lock stealing if the lock is
> -	 * very likely owned by readers.
> -	 */
> -	if ((atomic_long_read(&sem->owner) & RWSEM_READER_OWNED) &&
> -	    (rcnt > 1) && !(count & RWSEM_WRITER_LOCKED))
> -		goto queue;
> -
> -	/*
> -	 * Reader optimistic lock stealing.
> -	 */
> -	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) {
> -		rwsem_set_reader_owned(sem);
> -		lockevent_inc(rwsem_rlock_steal);
> -
> -		/*
> -		 * Wake up other readers in the wait queue if it is
> -		 * the first reader.
> -		 */
> -		if ((rcnt == 1) && (count & RWSEM_FLAG_WAITERS)) {
> -			raw_spin_lock_irq(&sem->wait_lock);
> -			if (sem->first_waiter)
> -				rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED,
> -						&wake_q);
> -			raw_spin_unlock_irq(&sem->wait_lock);
> -			wake_up_q(&wake_q);
> -		}
> -		return sem;
> -	}
> -
> -queue:
>   	waiter.task = current;
>   	waiter.type = RWSEM_WAITING_FOR_READ;
>   	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
> --
> 2.39.3
>
Re: [PATCH] locking/rwsem: Remove reader optimistic lock stealing
Posted by Peter Zijlstra 2 days, 11 hours ago
On Thu, May 21, 2026 at 10:08:58PM -0400, Waiman Long wrote:
> On 5/21/26 5:59 AM, Peng Wang wrote:
> > Reader optimistic lock stealing, introduced by commit 1a728dff855a
> > ("locking/rwsem: Enable reader optimistic lock stealing") and made more
> > aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
> > optimistic spinning"), allows a reader entering the slowpath to bypass
> > the wait queue and acquire the lock directly when WRITER_LOCKED and
> > HANDOFF bits are not set.
> > 
> > This causes severe writer starvation in workloads where readers hold
> > the lock for extended periods, such as Direct I/O operations which
> > hold inode->i_rwsem for the entire duration of iomap_dio_rw().  A
> > common example is log-structured storage where one thread appends via
> > DIO writes while another thread tails the log via DIO reads -- a
> > pattern seen in database redo-log replay and shared-storage
> > replication.
> 
> It is generally assume that reader lock critical section is shorter than
> that of writer. In this particular case, does the reader critical section
> run longer than the writer's one?

Well, that and writers are assumed to be rare. Reader-writer setups
where writers are common or even dominant make little sense. And that
seems to be exactly this. Then again, it isn't unreasonable to expect it
to not perform significantly worse than an exclusive lock.

> Reader lock stealing should only happen if the previous lock owner is a
> writer. So readers and writer should at most alternately own the lock if
> there are many readers waiting. Of course, if a reader own the lock, it will
> wake up the remaining readers in the wait queue.

Anyway, IIRC I've mentioned phase change locks many times before. And
what we have here is an asymmetric phase change. The timeout causes a
change to writers, but any one writer completing then switches back to
reader dominance.

Perhaps look at evening out the phase change. Retain the 'no-steal'
phase for an equal duration.

Also, 4ms is an eternity, that might need tweaking too.
Re: [PATCH] locking/rwsem: Remove reader optimistic lock stealing
Posted by Peng Wang 2 days, 10 hours ago
On Fri, May 22, 2026 at 10:55:13AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2026 at 10:08:58PM -0400, Waiman Long wrote:
> > On 5/21/26 5:59 AM, Peng Wang wrote:
> > > Reader optimistic lock stealing, introduced by commit 1a728dff855a
> > > ("locking/rwsem: Enable reader optimistic lock stealing") and made more
> > > aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
> > > optimistic spinning"), allows a reader entering the slowpath to bypass
> > > the wait queue and acquire the lock directly when WRITER_LOCKED and
> > > HANDOFF bits are not set.
> > > 
> > > This causes severe writer starvation in workloads where readers hold
> > > the lock for extended periods, such as Direct I/O operations which
> > > hold inode->i_rwsem for the entire duration of iomap_dio_rw().  A
> > > common example is log-structured storage where one thread appends via
> > > DIO writes while another thread tails the log via DIO reads -- a
> > > pattern seen in database redo-log replay and shared-storage
> > > replication.
> > 
> > It is generally assume that reader lock critical section is shorter than
> > that of writer. In this particular case, does the reader critical section
> > run longer than the writer's one?
> 
> Well, that and writers are assumed to be rare. Reader-writer setups
> where writers are common or even dominant make little sense. And that
> seems to be exactly this. Then again, it isn't unreasonable to expect it
> to not perform significantly worse than an exclusive lock.
> 
> > Reader lock stealing should only happen if the previous lock owner is a
> > writer. So readers and writer should at most alternately own the lock if
> > there are many readers waiting. Of course, if a reader own the lock, it will
> > wake up the remaining readers in the wait queue.
> 
> Anyway, IIRC I've mentioned phase change locks many times before. And
> what we have here is an asymmetric phase change. The timeout causes a
> change to writers, but any one writer completing then switches back to
> reader dominance.
> 
> Perhaps look at evening out the phase change. Retain the 'no-steal'
> phase for an equal duration.
> 
> Also, 4ms is an eternity, that might need tweaking too.

Hi Peter,

I agree this is an asymmetric phase change. Looking at the code, the asymmetry
appears to exist only through the steal path, which is only reachable when
RWSEM_FLAG_WAITERS triggers slowpath entry (since READ_FAILED_MASK includes WAITERS,
and the steal condition requires WRITER_LOCKED=0 and HANDOFF=0).

With steal removed, the wait queue with phase-fair batch wakeup seems to provide naturally
symmetric transitions without explicit phase management:
  - Writer at queue head -> writer runs
  - Reader at queue head -> all readers batch-woken in parallel
  - Alternation governed by arrival order

A read-heavy mmap benchmark (16 threads, short critical sections) showed no measurable difference
with steal removed (308k vs 306k ops/sec), as readers succeed via fast path when there is no contention.

Regarding the timeout: agreed 4ms feels too long for fast storage.
With steal removed, handoff would only matter as a backstop when a writer enters the queue while readers are already active.

Would the simple removal be acceptable, or would you prefer a more structured phase-change approach?

Best regards,
Peng
Re: [PATCH] locking/rwsem: Remove reader optimistic lock stealing
Posted by Peter Zijlstra 2 days, 11 hours ago
On Fri, May 22, 2026 at 10:55:13AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2026 at 10:08:58PM -0400, Waiman Long wrote:
> > On 5/21/26 5:59 AM, Peng Wang wrote:
> > > Reader optimistic lock stealing, introduced by commit 1a728dff855a
> > > ("locking/rwsem: Enable reader optimistic lock stealing") and made more
> > > aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
> > > optimistic spinning"), allows a reader entering the slowpath to bypass
> > > the wait queue and acquire the lock directly when WRITER_LOCKED and
> > > HANDOFF bits are not set.
> > > 
> > > This causes severe writer starvation in workloads where readers hold
> > > the lock for extended periods, such as Direct I/O operations which
> > > hold inode->i_rwsem for the entire duration of iomap_dio_rw().  A
> > > common example is log-structured storage where one thread appends via
> > > DIO writes while another thread tails the log via DIO reads -- a
> > > pattern seen in database redo-log replay and shared-storage
> > > replication.
> > 
> > It is generally assume that reader lock critical section is shorter than
> > that of writer. In this particular case, does the reader critical section
> > run longer than the writer's one?
> 
> Well, that and writers are assumed to be rare. Reader-writer setups
> where writers are common or even dominant make little sense. And that
> seems to be exactly this. Then again, it isn't unreasonable to expect it
> to not perform significantly worse than an exclusive lock.
> 
> > Reader lock stealing should only happen if the previous lock owner is a
> > writer. So readers and writer should at most alternately own the lock if
> > there are many readers waiting. Of course, if a reader own the lock, it will
> > wake up the remaining readers in the wait queue.
> 
> Anyway, IIRC I've mentioned phase change locks many times before. And
> what we have here is an asymmetric phase change. The timeout causes a
> change to writers, but any one writer completing then switches back to
> reader dominance.
> 
> Perhaps look at evening out the phase change. Retain the 'no-steal'
> phase for an equal duration.
> 
> Also, 4ms is an eternity, that might need tweaking too.

Also, perhaps reduce MAX_READERS_WAKEUP when in the 'writer' phase of
things.
Re: [PATCH] locking/rwsem: Remove reader optimistic lock stealing
Posted by Peng Wang 2 days, 17 hours ago
On Thu, May 21, 2026 at 10:08:58PM -0400, Waiman Long wrote:
> On 5/21/26 5:59 AM, Peng Wang wrote:
> > Reader optimistic lock stealing, introduced by commit 1a728dff855a
> > ("locking/rwsem: Enable reader optimistic lock stealing") and made more
> > aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
> > optimistic spinning"), allows a reader entering the slowpath to bypass
> > the wait queue and acquire the lock directly when WRITER_LOCKED and
> > HANDOFF bits are not set.
> > 
> > This causes severe writer starvation in workloads where readers hold
> > the lock for extended periods, such as Direct I/O operations which
> > hold inode->i_rwsem for the entire duration of iomap_dio_rw().  A
> > common example is log-structured storage where one thread appends via
> > DIO writes while another thread tails the log via DIO reads -- a
> > pattern seen in database redo-log replay and shared-storage
> > replication.
> 
> It is generally assume that reader lock critical section is shorter than
> that of writer. In this particular case, does the reader critical section
> run longer than the writer's one?

Hi Longman,
Thanks for the quick reply.

In this workload both reader and writer hold i_rwsem across iomap_dio_rw(),
so the critical sections are not short, and they cover the full disk I/O latency
about hundreds of microseconds on NVMe

> 
> Reader lock stealing should only happen if the previous lock owner is a
> writer. So readers and writer should at most alternately own the lock if
> there are many readers waiting. Of course, if a reader own the lock, it will
> wake up the remaining readers in the wait queue.

Because the steal (an atomic operation, ~50ns) is orders of magnitude faster than
the writer wakeup path (context switch, ~10-100us), the new reader wins this race
every time the lock becomes free.
This repeats with each subsequent reader arrival until the HANDOFF timeout.

> 
> > 
> > The problem occurs because:
> > 
> > 1. A reader entering the slowpath (due to RWSEM_FLAG_WAITERS being set)
> >     can still steal the lock as the steal condition only checks
> >     WRITER_LOCKED and HANDOFF, not WAITERS. This is inconsistent with
> >     the fast path which already blocks readers when WAITERS is set (via
> >     RWSEM_READ_FAILED_MASK).
> > 
> > 2. In the window between the last reader releasing the lock and the
> >     waiting writer being scheduled (~10-100us), a new reader can steal
> >     the lock in ~50-100ns. This race is structurally inevitable due to
> >     the 1000x speed difference between atomic operations vs context
> >     switching.
> > 
> > 3. Each stolen read lock is held for the full DIO duration (potentially
> >     milliseconds), and the pattern repeats until the 4ms HANDOFF timeout.
> >     This effectively taxes every write operation with a ~4ms penalty.
> > 
> > Performance impact measured with a DIO mixed read/write workload
> > (1 writer + 1 reader, O_DIRECT, ext4):
> > 
> >    NVMe SSD:              Before          After
> >    Write-only baseline:   397 MB/s        397 MB/s (no change)
> >    Mixed write throughput: 11 MB/s        350 MB/s (+31x)
> >    Mixed write latency:   880 us          23 us   (-38x)
> >    Mixed read throughput:  95 MB/s         95 MB/s (no change)
> 
> Does the reader wait for something else and taking other locks after
> acquiring the read lock? Can you point me of the reader critical section for
> this particular workload?

The reader does not wait on anything else or take other locks after acquiring i_rwsem.
The critical section is as below:

 ext4_dio_read_iter()  (fs/ext4/file.c:70)
    inode_lock_shared(inode);
    ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, 0, NULL, 0);
    inode_unlock_shared(inode);

The lock is held across iomap_dio_rw() which submits the bio and waits for I/O completion synchronously.
The hold time is essentially the device latency for a single DIO read.

> 
> Cheers,
> Longman
> 
> > 
> > Fixes: 617f3ef95177 ("locking/rwsem: Remove reader optimistic spinning")
> > Signed-off-by: Peng Wang <peng_wang@linux.alibaba.com>
> > ---
> >   kernel/locking/lock_events_list.h |  1 -
> >   kernel/locking/rwsem.c            | 33 ---------------------------------
> >   2 files changed, 34 deletions(-)
> > 
> > diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
> > index 97fb6f3f840a..35b45576bee4 100644
> > --- a/kernel/locking/lock_events_list.h
> > +++ b/kernel/locking/lock_events_list.h
> > @@ -65,7 +65,6 @@ LOCK_EVENT(rwsem_opt_lock)	/* # of opt-acquired write locks	*/
> >   LOCK_EVENT(rwsem_opt_fail)	/* # of failed optspins			*/
> >   LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled optspins		*/
> >   LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
> > -LOCK_EVENT(rwsem_rlock_steal)	/* # of read locks by lock stealing	*/
> >   LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
> >   LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
> >   LOCK_EVENT(rwsem_rlock_handoff)	/* # of read lock handoffs		*/
> > diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> > index bda5577339c0..40b141c5765f 100644
> > --- a/kernel/locking/rwsem.c
> > +++ b/kernel/locking/rwsem.c
> > @@ -1017,42 +1017,9 @@ static struct rw_semaphore __sched *
> >   rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int state)
> >   {
> >   	long adjustment = -RWSEM_READER_BIAS;
> > -	long rcnt = (count >> RWSEM_READER_SHIFT);
> >   	struct rwsem_waiter waiter, *first;
> >   	DEFINE_WAKE_Q(wake_q);
> > 
> > -	/*
> > -	 * To prevent a constant stream of readers from starving a sleeping
> > -	 * writer, don't attempt optimistic lock stealing if the lock is
> > -	 * very likely owned by readers.
> > -	 */
> > -	if ((atomic_long_read(&sem->owner) & RWSEM_READER_OWNED) &&
> > -	    (rcnt > 1) && !(count & RWSEM_WRITER_LOCKED))
> > -		goto queue;
> > -
> > -	/*
> > -	 * Reader optimistic lock stealing.
> > -	 */
> > -	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) {
> > -		rwsem_set_reader_owned(sem);
> > -		lockevent_inc(rwsem_rlock_steal);
> > -
> > -		/*
> > -		 * Wake up other readers in the wait queue if it is
> > -		 * the first reader.
> > -		 */
> > -		if ((rcnt == 1) && (count & RWSEM_FLAG_WAITERS)) {
> > -			raw_spin_lock_irq(&sem->wait_lock);
> > -			if (sem->first_waiter)
> > -				rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED,
> > -						&wake_q);
> > -			raw_spin_unlock_irq(&sem->wait_lock);
> > -			wake_up_q(&wake_q);
> > -		}
> > -		return sem;
> > -	}
> > -
> > -queue:
> >   	waiter.task = current;
> >   	waiter.type = RWSEM_WAITING_FOR_READ;
> >   	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
> > --
> > 2.39.3
> > 
>