[v2] Fix mid_q_entry memory leaks in SMB client

[PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Wang Zhaolong 2 months ago

I've been investigating a pretty nasty memory leak in the SMB client. When
compound requests get interrupted by signals, we end up with mid_q_entry
structures and server buffers that never get freed[1].

User foreground process                    cifsd
cifs_readdir
 open_cached_dir
  cifs_send_recv
   compound_send_recv
    smb2_setup_request
     smb2_mid_entry_alloc
      smb2_get_mid_entry
       smb2_mid_entry_alloc
        mempool_alloc // alloc mid
        kref_init(&temp->refcount); // refcount = 1
     mid[0]->callback = cifs_compound_callback;
     mid[1]->callback = cifs_compound_last_callback;
     smb_send_rqst
     rc = wait_for_response
      wait_event_state TASK_KILLABLE
                                  cifs_demultiplex_thread
                                    allocate_buffers
                                      server->bigbuf = cifs_buf_get()
                                    standard_receive3
                                      ->find_mid()
                                        smb2_find_mid
                                          __smb2_find_mid
                                           kref_get(&mid->refcount) // +1
                                      cifs_handle_standard
                                        handle_mid
                                         /* bigbuf will also leak */
                                         mid->resp_buf = server->bigbuf
                                         server->bigbuf = NULL;
                                         dequeue_mid
                                     /* in for loop */
                                    mids[0]->callback
                                      cifs_compound_callback
    /* Signal interrupts wait: rc = -ERESTARTSYS */
    /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
    midQ[0]->callback = cifs_cancelled_callback;
    cancelled_mid[i] = true;
                                       /* The change comes too late */
                                       mid->mid_state = MID_RESPONSE_READY
                                    release_mid  // -1
    /* cancelled_mid[i] == true causes mid won't be released
       in compound_send_recv cleanup */
    /* cifs_cancelled_callback won't executed to release mid */

The core issue is a race condition where cifs_cancelled_callback never
gets a chance to run, so cleanup never happens. I've spent quite a bit
of time trying to understand how to fix this safely.

Honestly, my first instinct was to just patch the callback assignment
logic directly. But the more I dug into it, the more I realized that
the current locking scheme makes this really tricky to do safely. We
have one big lock protecting multiple different things, and trying to
fix the race condition directly felt like playing with fire.

I kept running into scenarios where a "simple" fix could introduce
deadlocks or new race conditions. After looking at this from different
angles, I came to the conclusion that I needed to refactor the locking
first to create a safe foundation for the actual fix.

Patches 1-3 are foundational refactoring. These three patches rename
locks for clarity, separate counter protection from queue operations,
and replace the confusing mid_flags bitmask with explicit boolean
fields. Basically, they untangle the current locking mess so I can
implement the real fix without breaking anything.

The 4th patch in the series is where the real fix happens. With
the previous refactoring in place, I could safely add a lock to each
mid_q_entry and implement atomic callback execution. This eliminates
the race condition that was causing the leaks.

In summary, my approach to the fix is to use smaller-grained locking to
avoid race conditions. However, during the implementation process,
this approach involves more changes than I initially hoped for. If
there's a simpler or more elegant way to fix this race condition that
I've missed, I'd love to hear about it. I've tried to be thorough in
my analysis, but I know there are folks with more experience in this
codebase who might see a better path.

V1 -> V2:
  - Inline the mid_execute_callback() in the smb2ops.c to eliminate
    the sparse warning.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 [1]

Wang Zhaolong (4):
  smb: client: rename server mid_lock to mid_queue_lock
  smb: client: add mid_counter_lock to protect the mid counter counter
  smb: client: smb: client: eliminate mid_flags field
  smb: client: fix mid_q_entry memleak leak with per-mid locking

 fs/smb/client/cifs_debug.c    | 12 ++++--
 fs/smb/client/cifsglob.h      | 22 ++++++-----
 fs/smb/client/connect.c       | 57 +++++++++++++++++----------
 fs/smb/client/smb1ops.c       | 23 +++++++----
 fs/smb/client/smb2ops.c       | 72 +++++++++++++++++++----------------
 fs/smb/client/smb2transport.c |  5 ++-
 fs/smb/client/transport.c     | 71 ++++++++++++++++++----------------
 7 files changed, 152 insertions(+), 110 deletions(-)

-- 
2.39.2

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Enzo Matsumiya 2 months ago

Hi Wang,

On 08/05, Wang Zhaolong wrote:
>I've been investigating a pretty nasty memory leak in the SMB client. When
>compound requests get interrupted by signals, we end up with mid_q_entry
>structures and server buffers that never get freed[1].
>
>User foreground process                    cifsd
>cifs_readdir
> open_cached_dir
>  cifs_send_recv
>   compound_send_recv
>    smb2_setup_request
>     smb2_mid_entry_alloc
>      smb2_get_mid_entry
>       smb2_mid_entry_alloc
>        mempool_alloc // alloc mid
>        kref_init(&temp->refcount); // refcount = 1
>     mid[0]->callback = cifs_compound_callback;
>     mid[1]->callback = cifs_compound_last_callback;
>     smb_send_rqst
>     rc = wait_for_response
>      wait_event_state TASK_KILLABLE
>                                  cifs_demultiplex_thread
>                                    allocate_buffers
>                                      server->bigbuf = cifs_buf_get()
>                                    standard_receive3
>                                      ->find_mid()
>                                        smb2_find_mid
>                                          __smb2_find_mid
>                                           kref_get(&mid->refcount) // +1
>                                      cifs_handle_standard
>                                        handle_mid
>                                         /* bigbuf will also leak */
>                                         mid->resp_buf = server->bigbuf
>                                         server->bigbuf = NULL;
>                                         dequeue_mid
>                                     /* in for loop */
>                                    mids[0]->callback
>                                      cifs_compound_callback
>    /* Signal interrupts wait: rc = -ERESTARTSYS */
>    /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
>    midQ[0]->callback = cifs_cancelled_callback;
>    cancelled_mid[i] = true;
>                                       /* The change comes too late */
>                                       mid->mid_state = MID_RESPONSE_READY
>                                    release_mid  // -1
>    /* cancelled_mid[i] == true causes mid won't be released
>       in compound_send_recv cleanup */
>    /* cifs_cancelled_callback won't executed to release mid */
>
>The core issue is a race condition where cifs_cancelled_callback never
>gets a chance to run, so cleanup never happens. I've spent quite a bit
>of time trying to understand how to fix this safely.

Do you have a reproducer for this?  mids are allocated from kmem cache,
and a leak should certainly be visible (WARN on rmmod), even without any
debugging facilities enabled.

However, I do know that the following problem is quite common in cifs:

thread 0        | thread 1
----------------|----------------
                 | lock
                 | check data
                 | data is valid
                 | unlock
lock            |
invalidate data | lock (spins)
unlock          | ...
                 | // assumes data still valid
                 | use invalid data (not really freed)
                 | unlock

You see that no matter how many locks you add to protect data, there's
still a chance of having this "race condition" feeling.

So, personally, I'm skeptical about having yet another spinlock with
questionable or no effect at all.

But again, if I can reproduce this bug myself, it'll be much easier to
analyse effectiveness/review your patches.

Apart from that, cleanup patches always get my +1 :)


Cheers,

Enzo

>Honestly, my first instinct was to just patch the callback assignment
>logic directly. But the more I dug into it, the more I realized that
>the current locking scheme makes this really tricky to do safely. We
>have one big lock protecting multiple different things, and trying to
>fix the race condition directly felt like playing with fire.
>
>I kept running into scenarios where a "simple" fix could introduce
>deadlocks or new race conditions. After looking at this from different
>angles, I came to the conclusion that I needed to refactor the locking
>first to create a safe foundation for the actual fix.
>
>Patches 1-3 are foundational refactoring. These three patches rename
>locks for clarity, separate counter protection from queue operations,
>and replace the confusing mid_flags bitmask with explicit boolean
>fields. Basically, they untangle the current locking mess so I can
>implement the real fix without breaking anything.
>
>The 4th patch in the series is where the real fix happens. With
>the previous refactoring in place, I could safely add a lock to each
>mid_q_entry and implement atomic callback execution. This eliminates
>the race condition that was causing the leaks.
>
>In summary, my approach to the fix is to use smaller-grained locking to
>avoid race conditions. However, during the implementation process,
>this approach involves more changes than I initially hoped for. If
>there's a simpler or more elegant way to fix this race condition that
>I've missed, I'd love to hear about it. I've tried to be thorough in
>my analysis, but I know there are folks with more experience in this
>codebase who might see a better path.
>
>V1 -> V2:
>  - Inline the mid_execute_callback() in the smb2ops.c to eliminate
>    the sparse warning.
>
>Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 [1]
>
>Wang Zhaolong (4):
>  smb: client: rename server mid_lock to mid_queue_lock
>  smb: client: add mid_counter_lock to protect the mid counter counter
>  smb: client: smb: client: eliminate mid_flags field
>  smb: client: fix mid_q_entry memleak leak with per-mid locking
>
> fs/smb/client/cifs_debug.c    | 12 ++++--
> fs/smb/client/cifsglob.h      | 22 ++++++-----
> fs/smb/client/connect.c       | 57 +++++++++++++++++----------
> fs/smb/client/smb1ops.c       | 23 +++++++----
> fs/smb/client/smb2ops.c       | 72 +++++++++++++++++++----------------
> fs/smb/client/smb2transport.c |  5 ++-
> fs/smb/client/transport.c     | 71 ++++++++++++++++++----------------
> 7 files changed, 152 insertions(+), 110 deletions(-)
>
>-- 
>2.39.2
>
>

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Wang Zhaolong 2 months ago

> 
> Do you have a reproducer for this?  mids are allocated from kmem cache,
> and a leak should certainly be visible (WARN on rmmod), even without any
> debugging facilities enabled.

Thanks for your reply and review!

Yes, I have the reproducer. I have listed it in the commit message.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404

> 
> However, I do know that the following problem is quite common in cifs:
> 
> thread 0        | thread 1
> ----------------|----------------
>                  | lock
>                  | check data
>                  | data is valid
>                  | unlock
> lock            |
> invalidate data | lock (spins)
> unlock          | ...
>                  | // assumes data still valid
>                  | use invalid data (not really freed)
>                  | unlock
> 
> You see that no matter how many locks you add to protect data, there's
> still a chance of having this "race condition" feeling.
> 
> So, personally, I'm skeptical about having yet another spinlock with
> questionable or no effect at all.
> 
> But again, if I can reproduce this bug myself, it'll be much easier to
> analyse effectiveness/review your patches.
> 
> Apart from that, cleanup patches always get my +1 :)
> 



The data is interrelated, and updates to the data are protected by per-mid locks
to avoid `race condition`.

For example, in the following scenario, the update and check of
mid->mid_state are protected by the lock spin_lock(&midQ[i]->mid_lock),
which can prevent leakage issues.


```
User foreground process                    cifsd
cifs_readdir
  open_cached_dir
   cifs_send_recv
    compound_send_recv
     smb2_setup_request
      smb2_mid_entry_alloc
       smb2_get_mid_entry
        smb2_mid_entry_alloc
         mempool_alloc // alloc mid
         kref_init(&temp->refcount); // refcount = 1
      mid[0]->callback = cifs_compound_callback;
      mid[1]->callback = cifs_compound_last_callback;
      smb_send_rqst
      rc = wait_for_response
       wait_event_state TASK_KILLABLE
                                   cifs_demultiplex_thread
                                     allocate_buffers
                                       server->bigbuf = cifs_buf_get()
                                     standard_receive3
                                       ->find_mid()
                                         smb2_find_mid
                                           __smb2_find_mid
                                            kref_get(&mid->refcount) // +1
                                       cifs_handle_standard
                                         handle_mid
                                          /* bigbuf will also leak */
                                          mid->resp_buf = server->bigbuf
                                          server->bigbuf = NULL;
                                          dequeue_mid
                                      /* in for loop */
     /* Signal interrupts wait: rc = -ERESTARTSYS */
                                     spin_lock(&midQ[i]->mid_lock);
                                     mids[0]->callback
                                       cifs_compound_callback
                                        /* The change comes too late */
                                        mid->mid_state = MID_RESPONSE_READY
                                     spin_lock(&midQ[i]->mid_lock);
     spin_lock(&midQ[i]->mid_lock);
     /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) Not satisfied  *?
     spin_unlock(&midQ[i]->mid_lock);
                                     release_mid  // -1
     release_mid  // -1
```

Or, in the case where the callback is replaced

User foreground process                    cifsd
cifs_readdir
  open_cached_dir
   cifs_send_recv
    compound_send_recv
     smb2_setup_request
      smb2_mid_entry_alloc
       smb2_get_mid_entry
        smb2_mid_entry_alloc
         mempool_alloc // alloc mid
         kref_init(&temp->refcount); // refcount = 1
      mid[0]->callback = cifs_compound_callback;
      mid[1]->callback = cifs_compound_last_callback;
      smb_send_rqst
      rc = wait_for_response
       wait_event_state TASK_KILLABLE
                                   cifs_demultiplex_thread
                                     allocate_buffers
                                       server->bigbuf = cifs_buf_get()
                                     standard_receive3
                                       ->find_mid()
                                         smb2_find_mid
                                           __smb2_find_mid
                                            kref_get(&mid->refcount) // +1
                                       cifs_handle_standard
                                         handle_mid
                                          /* bigbuf will also leak */
                                          mid->resp_buf = server->bigbuf
                                          server->bigbuf = NULL;
                                          dequeue_mid
                                      /* in for loop */
     /* Signal interrupts wait: rc = -ERESTARTSYS */
     spin_lock(&midQ[i]->mid_lock);
     /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) Satisfied  *?
     midQ[0]->callback = cifs_cancelled_callback;
     cancelled_mid[i] = true;
     spin_unlock(&midQ[i]->mid_lock);
                                     spin_lock(&midQ[i]->mid_lock);
                                     mids[0]->callback
                                       cifs_cancelled_callback
                                         cifs_compound_callback
                                          /* The change comes too late */
                                          mid->mid_state = MID_RESPONSE_READY
                                         release_mid  // -1
                                     spin_lock(&midQ[i]->mid_lock);
                                     release_mid  // -1


I know this approach is quite ugly, but it is a relatively reliable
method that preserves the historical bugfix logic.

Relevant patches:

d527f51331ca ("cifs: Fix UAF in cifs_demultiplex_thread()")
ee258d79159a ("CIFS: Move credit processing to mid callbacks for SMB3")
8544f4aa9dd1 ("CIFS: Fix credit computation for compounded requests")
d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data")


Thanks again for your reply. I’m looking forward to any further
suggestions you may have to improve the patch.

Best regards,
Wang Zhaolong

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Steve French 2 months ago

The first three patches (cleanup) look fine and have added to
cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
("smb: client: fix mid_q_entry memleak leak with per-mid locking")
causes xfstest generic/001 to fail with signing enabled.  See
http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio


[Tue Aug 5 11:03:32 2025] run fstests generic/001 at 2025-08-05 11:03:32
[Tue Aug 5 11:03:33 2025] =============================
[Tue Aug 5 11:03:33 2025] [ BUG: Invalid wait context ]
[Tue Aug 5 11:03:33 2025] 6.16.0 #1 Tainted: G E
[Tue Aug 5 11:03:33 2025] -----------------------------
[Tue Aug 5 11:03:33 2025] cifsd/24912 is trying to lock:
[Tue Aug 5 11:03:33 2025] ffffffffafc14630
(crypto_alg_sem){++++}-{4:4}, at: crypto_alg_lookup+0x40/0x120
[Tue Aug 5 11:03:33 2025] other info that might help us debug this:
[Tue Aug 5 11:03:33 2025] context-{5:5}
[Tue Aug 5 11:03:33 2025] 1 lock held by cifsd/24912:
[Tue Aug 5 11:03:33 2025] #0: ff11000134c25870
(&temp->mid_lock){+.+.}-{3:3}, at: mid_execute_callback+0x19/0x40
[cifs]
[Tue Aug 5 11:03:33 2025] stack backtrace:
[Tue Aug 5 11:03:33 2025] CPU: 1 UID: 0 PID: 24912 Comm: cifsd
Tainted: G E 6.16.0 #1 PREEMPT(voluntary)
[Tue Aug 5 11:03:33 2025] Tainted: [E]=UNSIGNED_MODULE
[Tue Aug 5 11:03:33 2025] Hardware name: Red Hat KVM, BIOS
1.16.3-4.el9 04/01/2014
[Tue Aug 5 11:03:33 2025] Call Trace:
[Tue Aug 5 11:03:33 2025] <TASK>
[Tue Aug 5 11:03:33 2025] dump_stack_lvl+0x79/0xb0
[Tue Aug 5 11:03:33 2025] __lock_acquire+0xace/0x21c0
[Tue Aug 5 11:03:33 2025] ? check_irq_usage+0xa4/0xa80
[Tue Aug 5 11:03:33 2025] lock_acquire+0x143/0x2d0
[Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
[Tue Aug 5 11:03:33 2025] ? check_noncircular+0x71/0x120
[Tue Aug 5 11:03:33 2025] down_read+0x7c/0x2e0
[Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
[Tue Aug 5 11:03:33 2025] ? __pfx_down_read+0x10/0x10
[Tue Aug 5 11:03:33 2025] ? lockdep_unlock+0x51/0xc0
[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x11ee/0x21c0
[Tue Aug 5 11:03:33 2025] crypto_alg_lookup+0x40/0x120
[Tue Aug 5 11:03:33 2025] crypto_alg_mod_lookup+0x53/0x2b0
[Tue Aug 5 11:03:33 2025] crypto_alloc_tfm_node+0x76/0x130
[Tue Aug 5 11:03:33 2025] cifs_alloc_hash+0x44/0x130 [cifs]
[Tue Aug 5 11:03:33 2025] smb3_calc_signature+0x4f0/0x7b0 [cifs]
[Tue Aug 5 11:03:33 2025] ? __pfx_smb3_calc_signature+0x10/0x10 [cifs]
[Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
[Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
[Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
[Tue Aug 5 11:03:33 2025] ? trace_irq_enable.constprop.0+0xac/0xe0
[Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
[Tue Aug 5 11:03:33 2025] ? __local_bh_enable_ip+0x90/0xf0
[Tue Aug 5 11:03:33 2025] ? sock_has_perm+0x97/0x1a0
[Tue Aug 5 11:03:33 2025] smb2_verify_signature+0x178/0x290 [cifs]
[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_verify_signature+0x10/0x10 [cifs]
[Tue Aug 5 11:03:33 2025] ? look_up_lock_class+0x5d/0x140
[Tue Aug 5 11:03:33 2025] smb2_check_receive+0x154/0x1c0 [cifs]
[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_check_receive+0x10/0x10 [cifs]
[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
[Tue Aug 5 11:03:33 2025] smb2_writev_callback+0x1f2/0x870 [cifs]
[Tue Aug 5 11:03:33 2025] ? lock_acquire+0x143/0x2d0
[Tue Aug 5 11:03:33 2025] ? mid_execute_callback+0x19/0x40 [cifs]
[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[Tue Aug 5 11:03:33 2025] ? do_raw_spin_lock+0x10c/0x190
[Tue Aug 5 11:03:33 2025] ? __pfx_do_raw_spin_lock+0x10/0x10
[Tue Aug 5 11:03:33 2025] ? _raw_spin_unlock+0x23/0x40
[Tue Aug 5 11:03:33 2025] mid_execute_callback+0x33/0x40 [cifs]
[Tue Aug 5 11:03:33 2025] cifs_demultiplex_thread+0xc95/0x15e0 [cifs]
[Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
[Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
[Tue Aug 5 11:03:33 2025] ? __kthread_parkme+0x4b/0xd0
[Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
[Tue Aug 5 11:03:33 2025] kthread+0x216/0x3e0
[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
[Tue Aug 5 11:03:33 2025] ? lock_release+0xc4/0x270
[Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
[Tue Aug 5 11:03:33 2025] ret_from_fork+0x23a/0x2e0
[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
[Tue Aug 5 11:03:33 2025] ret_from_fork_asm+0x1a/0x30
[Tue Aug 5 11:03:33 2025] </TASK>

(it worked without the patch see e.g.
http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)

On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
<wangzhaolong@huaweicloud.com> wrote:
>
> I've been investigating a pretty nasty memory leak in the SMB client. When
> compound requests get interrupted by signals, we end up with mid_q_entry
> structures and server buffers that never get freed[1].
>
> User foreground process                    cifsd
> cifs_readdir
>  open_cached_dir
>   cifs_send_recv
>    compound_send_recv
>     smb2_setup_request
>      smb2_mid_entry_alloc
>       smb2_get_mid_entry
>        smb2_mid_entry_alloc
>         mempool_alloc // alloc mid
>         kref_init(&temp->refcount); // refcount = 1
>      mid[0]->callback = cifs_compound_callback;
>      mid[1]->callback = cifs_compound_last_callback;
>      smb_send_rqst
>      rc = wait_for_response
>       wait_event_state TASK_KILLABLE
>                                   cifs_demultiplex_thread
>                                     allocate_buffers
>                                       server->bigbuf = cifs_buf_get()
>                                     standard_receive3
>                                       ->find_mid()
>                                         smb2_find_mid
>                                           __smb2_find_mid
>                                            kref_get(&mid->refcount) // +1
>                                       cifs_handle_standard
>                                         handle_mid
>                                          /* bigbuf will also leak */
>                                          mid->resp_buf = server->bigbuf
>                                          server->bigbuf = NULL;
>                                          dequeue_mid
>                                      /* in for loop */
>                                     mids[0]->callback
>                                       cifs_compound_callback
>     /* Signal interrupts wait: rc = -ERESTARTSYS */
>     /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
>     midQ[0]->callback = cifs_cancelled_callback;
>     cancelled_mid[i] = true;
>                                        /* The change comes too late */
>                                        mid->mid_state = MID_RESPONSE_READY
>                                     release_mid  // -1
>     /* cancelled_mid[i] == true causes mid won't be released
>        in compound_send_recv cleanup */
>     /* cifs_cancelled_callback won't executed to release mid */
>
> The core issue is a race condition where cifs_cancelled_callback never
> gets a chance to run, so cleanup never happens. I've spent quite a bit
> of time trying to understand how to fix this safely.
>
> Honestly, my first instinct was to just patch the callback assignment
> logic directly. But the more I dug into it, the more I realized that
> the current locking scheme makes this really tricky to do safely. We
> have one big lock protecting multiple different things, and trying to
> fix the race condition directly felt like playing with fire.
>
> I kept running into scenarios where a "simple" fix could introduce
> deadlocks or new race conditions. After looking at this from different
> angles, I came to the conclusion that I needed to refactor the locking
> first to create a safe foundation for the actual fix.
>
> Patches 1-3 are foundational refactoring. These three patches rename
> locks for clarity, separate counter protection from queue operations,
> and replace the confusing mid_flags bitmask with explicit boolean
> fields. Basically, they untangle the current locking mess so I can
> implement the real fix without breaking anything.
>
> The 4th patch in the series is where the real fix happens. With
> the previous refactoring in place, I could safely add a lock to each
> mid_q_entry and implement atomic callback execution. This eliminates
> the race condition that was causing the leaks.
>
> In summary, my approach to the fix is to use smaller-grained locking to
> avoid race conditions. However, during the implementation process,
> this approach involves more changes than I initially hoped for. If
> there's a simpler or more elegant way to fix this race condition that
> I've missed, I'd love to hear about it. I've tried to be thorough in
> my analysis, but I know there are folks with more experience in this
> codebase who might see a better path.
>
> V1 -> V2:
>   - Inline the mid_execute_callback() in the smb2ops.c to eliminate
>     the sparse warning.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 [1]
>
> Wang Zhaolong (4):
>   smb: client: rename server mid_lock to mid_queue_lock
>   smb: client: add mid_counter_lock to protect the mid counter counter
>   smb: client: smb: client: eliminate mid_flags field
>   smb: client: fix mid_q_entry memleak leak with per-mid locking
>
>  fs/smb/client/cifs_debug.c    | 12 ++++--
>  fs/smb/client/cifsglob.h      | 22 ++++++-----
>  fs/smb/client/connect.c       | 57 +++++++++++++++++----------
>  fs/smb/client/smb1ops.c       | 23 +++++++----
>  fs/smb/client/smb2ops.c       | 72 +++++++++++++++++++----------------
>  fs/smb/client/smb2transport.c |  5 ++-
>  fs/smb/client/transport.c     | 71 ++++++++++++++++++----------------
>  7 files changed, 152 insertions(+), 110 deletions(-)
>
> --
> 2.39.2
>
>


-- 
Thanks,

Steve

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Enzo Matsumiya 2 months ago

On 08/05, Steve French wrote:
>The first three patches (cleanup) look fine and have added to
>cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
>("smb: client: fix mid_q_entry memleak leak with per-mid locking")
>causes xfstest generic/001 to fail with signing enabled.  See
>http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
>and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio

Was about to reply here as I was testing (an unrelated patch) with generic/100
and got the same backtrace.

@Wang btw sorry I missed your reproducer in the bugzilla link, I'll take
a look.  Thanks!

>[Tue Aug 5 11:03:32 2025] run fstests generic/001 at 2025-08-05 11:03:32
>[Tue Aug 5 11:03:33 2025] =============================
>[Tue Aug 5 11:03:33 2025] [ BUG: Invalid wait context ]
>[Tue Aug 5 11:03:33 2025] 6.16.0 #1 Tainted: G E
>[Tue Aug 5 11:03:33 2025] -----------------------------
>[Tue Aug 5 11:03:33 2025] cifsd/24912 is trying to lock:
>[Tue Aug 5 11:03:33 2025] ffffffffafc14630
>(crypto_alg_sem){++++}-{4:4}, at: crypto_alg_lookup+0x40/0x120
>[Tue Aug 5 11:03:33 2025] other info that might help us debug this:
>[Tue Aug 5 11:03:33 2025] context-{5:5}
>[Tue Aug 5 11:03:33 2025] 1 lock held by cifsd/24912:
>[Tue Aug 5 11:03:33 2025] #0: ff11000134c25870
>(&temp->mid_lock){+.+.}-{3:3}, at: mid_execute_callback+0x19/0x40
>[cifs]
>[Tue Aug 5 11:03:33 2025] stack backtrace:
>[Tue Aug 5 11:03:33 2025] CPU: 1 UID: 0 PID: 24912 Comm: cifsd
>Tainted: G E 6.16.0 #1 PREEMPT(voluntary)
>[Tue Aug 5 11:03:33 2025] Tainted: [E]=UNSIGNED_MODULE
>[Tue Aug 5 11:03:33 2025] Hardware name: Red Hat KVM, BIOS
>1.16.3-4.el9 04/01/2014
>[Tue Aug 5 11:03:33 2025] Call Trace:
>[Tue Aug 5 11:03:33 2025] <TASK>
>[Tue Aug 5 11:03:33 2025] dump_stack_lvl+0x79/0xb0
>[Tue Aug 5 11:03:33 2025] __lock_acquire+0xace/0x21c0
>[Tue Aug 5 11:03:33 2025] ? check_irq_usage+0xa4/0xa80
>[Tue Aug 5 11:03:33 2025] lock_acquire+0x143/0x2d0
>[Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
>[Tue Aug 5 11:03:33 2025] ? check_noncircular+0x71/0x120
>[Tue Aug 5 11:03:33 2025] down_read+0x7c/0x2e0
>[Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
>[Tue Aug 5 11:03:33 2025] ? __pfx_down_read+0x10/0x10
>[Tue Aug 5 11:03:33 2025] ? lockdep_unlock+0x51/0xc0
>[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x11ee/0x21c0
>[Tue Aug 5 11:03:33 2025] crypto_alg_lookup+0x40/0x120
>[Tue Aug 5 11:03:33 2025] crypto_alg_mod_lookup+0x53/0x2b0
>[Tue Aug 5 11:03:33 2025] crypto_alloc_tfm_node+0x76/0x130
>[Tue Aug 5 11:03:33 2025] cifs_alloc_hash+0x44/0x130 [cifs]
>[Tue Aug 5 11:03:33 2025] smb3_calc_signature+0x4f0/0x7b0 [cifs]
>[Tue Aug 5 11:03:33 2025] ? __pfx_smb3_calc_signature+0x10/0x10 [cifs]
>[Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
>[Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
>[Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
>[Tue Aug 5 11:03:33 2025] ? trace_irq_enable.constprop.0+0xac/0xe0
>[Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
>[Tue Aug 5 11:03:33 2025] ? __local_bh_enable_ip+0x90/0xf0
>[Tue Aug 5 11:03:33 2025] ? sock_has_perm+0x97/0x1a0
>[Tue Aug 5 11:03:33 2025] smb2_verify_signature+0x178/0x290 [cifs]
>[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_verify_signature+0x10/0x10 [cifs]
>[Tue Aug 5 11:03:33 2025] ? look_up_lock_class+0x5d/0x140
>[Tue Aug 5 11:03:33 2025] smb2_check_receive+0x154/0x1c0 [cifs]
>[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_check_receive+0x10/0x10 [cifs]
>[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
>[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
>[Tue Aug 5 11:03:33 2025] smb2_writev_callback+0x1f2/0x870 [cifs]
>[Tue Aug 5 11:03:33 2025] ? lock_acquire+0x143/0x2d0
>[Tue Aug 5 11:03:33 2025] ? mid_execute_callback+0x19/0x40 [cifs]
>[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
>[Tue Aug 5 11:03:33 2025] ? do_raw_spin_lock+0x10c/0x190
>[Tue Aug 5 11:03:33 2025] ? __pfx_do_raw_spin_lock+0x10/0x10
>[Tue Aug 5 11:03:33 2025] ? _raw_spin_unlock+0x23/0x40
>[Tue Aug 5 11:03:33 2025] mid_execute_callback+0x33/0x40 [cifs]
>[Tue Aug 5 11:03:33 2025] cifs_demultiplex_thread+0xc95/0x15e0 [cifs]
>[Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
>[Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
>[Tue Aug 5 11:03:33 2025] ? __kthread_parkme+0x4b/0xd0
>[Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
>[Tue Aug 5 11:03:33 2025] kthread+0x216/0x3e0
>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>[Tue Aug 5 11:03:33 2025] ? lock_release+0xc4/0x270
>[Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>[Tue Aug 5 11:03:33 2025] ret_from_fork+0x23a/0x2e0
>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>[Tue Aug 5 11:03:33 2025] ret_from_fork_asm+0x1a/0x30
>[Tue Aug 5 11:03:33 2025] </TASK>
>
>(it worked without the patch see e.g.
>http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
>and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)
>
>On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
><wangzhaolong@huaweicloud.com> wrote:
>>
>> I've been investigating a pretty nasty memory leak in the SMB client. When
>> compound requests get interrupted by signals, we end up with mid_q_entry
>> structures and server buffers that never get freed[1].
>>
>> User foreground process                    cifsd
>> cifs_readdir
>>  open_cached_dir
>>   cifs_send_recv
>>    compound_send_recv
>>     smb2_setup_request
>>      smb2_mid_entry_alloc
>>       smb2_get_mid_entry
>>        smb2_mid_entry_alloc
>>         mempool_alloc // alloc mid
>>         kref_init(&temp->refcount); // refcount = 1
>>      mid[0]->callback = cifs_compound_callback;
>>      mid[1]->callback = cifs_compound_last_callback;
>>      smb_send_rqst
>>      rc = wait_for_response
>>       wait_event_state TASK_KILLABLE
>>                                   cifs_demultiplex_thread
>>                                     allocate_buffers
>>                                       server->bigbuf = cifs_buf_get()
>>                                     standard_receive3
>>                                       ->find_mid()
>>                                         smb2_find_mid
>>                                           __smb2_find_mid
>>                                            kref_get(&mid->refcount) // +1
>>                                       cifs_handle_standard
>>                                         handle_mid
>>                                          /* bigbuf will also leak */
>>                                          mid->resp_buf = server->bigbuf
>>                                          server->bigbuf = NULL;
>>                                          dequeue_mid
>>                                      /* in for loop */
>>                                     mids[0]->callback
>>                                       cifs_compound_callback
>>     /* Signal interrupts wait: rc = -ERESTARTSYS */
>>     /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
>>     midQ[0]->callback = cifs_cancelled_callback;
>>     cancelled_mid[i] = true;
>>                                        /* The change comes too late */
>>                                        mid->mid_state = MID_RESPONSE_READY
>>                                     release_mid  // -1
>>     /* cancelled_mid[i] == true causes mid won't be released
>>        in compound_send_recv cleanup */
>>     /* cifs_cancelled_callback won't executed to release mid */
>>
>> The core issue is a race condition where cifs_cancelled_callback never
>> gets a chance to run, so cleanup never happens. I've spent quite a bit
>> of time trying to understand how to fix this safely.
>>
>> Honestly, my first instinct was to just patch the callback assignment
>> logic directly. But the more I dug into it, the more I realized that
>> the current locking scheme makes this really tricky to do safely. We
>> have one big lock protecting multiple different things, and trying to
>> fix the race condition directly felt like playing with fire.
>>
>> I kept running into scenarios where a "simple" fix could introduce
>> deadlocks or new race conditions. After looking at this from different
>> angles, I came to the conclusion that I needed to refactor the locking
>> first to create a safe foundation for the actual fix.
>>
>> Patches 1-3 are foundational refactoring. These three patches rename
>> locks for clarity, separate counter protection from queue operations,
>> and replace the confusing mid_flags bitmask with explicit boolean
>> fields. Basically, they untangle the current locking mess so I can
>> implement the real fix without breaking anything.
>>
>> The 4th patch in the series is where the real fix happens. With
>> the previous refactoring in place, I could safely add a lock to each
>> mid_q_entry and implement atomic callback execution. This eliminates
>> the race condition that was causing the leaks.
>>
>> In summary, my approach to the fix is to use smaller-grained locking to
>> avoid race conditions. However, during the implementation process,
>> this approach involves more changes than I initially hoped for. If
>> there's a simpler or more elegant way to fix this race condition that
>> I've missed, I'd love to hear about it. I've tried to be thorough in
>> my analysis, but I know there are folks with more experience in this
>> codebase who might see a better path.
>>
>> V1 -> V2:
>>   - Inline the mid_execute_callback() in the smb2ops.c to eliminate
>>     the sparse warning.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 [1]
>>
>> Wang Zhaolong (4):
>>   smb: client: rename server mid_lock to mid_queue_lock
>>   smb: client: add mid_counter_lock to protect the mid counter counter
>>   smb: client: smb: client: eliminate mid_flags field
>>   smb: client: fix mid_q_entry memleak leak with per-mid locking
>>
>>  fs/smb/client/cifs_debug.c    | 12 ++++--
>>  fs/smb/client/cifsglob.h      | 22 ++++++-----
>>  fs/smb/client/connect.c       | 57 +++++++++++++++++----------
>>  fs/smb/client/smb1ops.c       | 23 +++++++----
>>  fs/smb/client/smb2ops.c       | 72 +++++++++++++++++++----------------
>>  fs/smb/client/smb2transport.c |  5 ++-
>>  fs/smb/client/transport.c     | 71 ++++++++++++++++++----------------
>>  7 files changed, 152 insertions(+), 110 deletions(-)
>>
>> --
>> 2.39.2
>>
>>
>
>
>-- 
>Thanks,
>
>Steve
>

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Enzo Matsumiya 2 months ago

Hi Wang,

On 08/05, Enzo Matsumiya wrote:
>On 08/05, Steve French wrote:
>>The first three patches (cleanup) look fine and have added to
>>cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
>>("smb: client: fix mid_q_entry memleak leak with per-mid locking")
>>causes xfstest generic/001 to fail with signing enabled.  See
>>http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
>>and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio
>
>Was about to reply here as I was testing (an unrelated patch) with generic/100
>and got the same backtrace.
>
>@Wang btw sorry I missed your reproducer in the bugzilla link, I'll take
>a look.  Thanks!

So, strangely, your poc.c doesn't give that backtrace (invalid wait
context) that Steve mentions.

And while I could confirm the original mid leak and that your fix works,
now I get this in kmemleak:

...
unreferenced object 0xffff8881064edf00 (size 192):
   comm "poc", pid 36032, jiffies 4294813121
   hex dump (first 32 bytes):
     00 df 4e 06 81 88 ff ff 00 df 4e 06 81 88 ff ff  ..N.......N.....
     01 00 00 00 00 00 00 00 00 50 6d 03 81 88 ff ff  .........Pm.....
   backtrace (crc fc0a60b2):
     kmem_cache_alloc_noprof+0x2f4/0x3f0
     mempool_alloc_noprof+0x6e/0x1c0
     generate_smb3signingkey+0x17f/0x2c0 [cifs]
     smb2_verify_signature+0x110/0x170 [cifs]
     compound_send_recv+0x11/0xb90 [cifs]
     compound_send_recv+0x983/0xb90 [cifs]
     SMB2_close_init+0x9f/0xc0 [cifs]
     cifs_readdir+0x44/0x1450 [cifs]
     iterate_dir+0x8a/0x160
     __x64_sys_getdents+0x7b/0x120
     do_syscall_64+0x6a/0x2d0
     entry_SYSCALL_64_after_hwframe+0x76/0x7e


 From a quick look at the code, I couldn't make sense of it, and even
less so _how_ your patch could be causing this.  But it's certainly
doing something that is impacting the signing crypto TFM.


Cheers,

Enzo

>>[Tue Aug 5 11:03:32 2025] run fstests generic/001 at 2025-08-05 11:03:32
>>[Tue Aug 5 11:03:33 2025] =============================
>>[Tue Aug 5 11:03:33 2025] [ BUG: Invalid wait context ]
>>[Tue Aug 5 11:03:33 2025] 6.16.0 #1 Tainted: G E
>>[Tue Aug 5 11:03:33 2025] -----------------------------
>>[Tue Aug 5 11:03:33 2025] cifsd/24912 is trying to lock:
>>[Tue Aug 5 11:03:33 2025] ffffffffafc14630
>>(crypto_alg_sem){++++}-{4:4}, at: crypto_alg_lookup+0x40/0x120
>>[Tue Aug 5 11:03:33 2025] other info that might help us debug this:
>>[Tue Aug 5 11:03:33 2025] context-{5:5}
>>[Tue Aug 5 11:03:33 2025] 1 lock held by cifsd/24912:
>>[Tue Aug 5 11:03:33 2025] #0: ff11000134c25870
>>(&temp->mid_lock){+.+.}-{3:3}, at: mid_execute_callback+0x19/0x40
>>[cifs]
>>[Tue Aug 5 11:03:33 2025] stack backtrace:
>>[Tue Aug 5 11:03:33 2025] CPU: 1 UID: 0 PID: 24912 Comm: cifsd
>>Tainted: G E 6.16.0 #1 PREEMPT(voluntary)
>>[Tue Aug 5 11:03:33 2025] Tainted: [E]=UNSIGNED_MODULE
>>[Tue Aug 5 11:03:33 2025] Hardware name: Red Hat KVM, BIOS
>>1.16.3-4.el9 04/01/2014
>>[Tue Aug 5 11:03:33 2025] Call Trace:
>>[Tue Aug 5 11:03:33 2025] <TASK>
>>[Tue Aug 5 11:03:33 2025] dump_stack_lvl+0x79/0xb0
>>[Tue Aug 5 11:03:33 2025] __lock_acquire+0xace/0x21c0
>>[Tue Aug 5 11:03:33 2025] ? check_irq_usage+0xa4/0xa80
>>[Tue Aug 5 11:03:33 2025] lock_acquire+0x143/0x2d0
>>[Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
>>[Tue Aug 5 11:03:33 2025] ? check_noncircular+0x71/0x120
>>[Tue Aug 5 11:03:33 2025] down_read+0x7c/0x2e0
>>[Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
>>[Tue Aug 5 11:03:33 2025] ? __pfx_down_read+0x10/0x10
>>[Tue Aug 5 11:03:33 2025] ? lockdep_unlock+0x51/0xc0
>>[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x11ee/0x21c0
>>[Tue Aug 5 11:03:33 2025] crypto_alg_lookup+0x40/0x120
>>[Tue Aug 5 11:03:33 2025] crypto_alg_mod_lookup+0x53/0x2b0
>>[Tue Aug 5 11:03:33 2025] crypto_alloc_tfm_node+0x76/0x130
>>[Tue Aug 5 11:03:33 2025] cifs_alloc_hash+0x44/0x130 [cifs]
>>[Tue Aug 5 11:03:33 2025] smb3_calc_signature+0x4f0/0x7b0 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? __pfx_smb3_calc_signature+0x10/0x10 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
>>[Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
>>[Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
>>[Tue Aug 5 11:03:33 2025] ? trace_irq_enable.constprop.0+0xac/0xe0
>>[Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
>>[Tue Aug 5 11:03:33 2025] ? __local_bh_enable_ip+0x90/0xf0
>>[Tue Aug 5 11:03:33 2025] ? sock_has_perm+0x97/0x1a0
>>[Tue Aug 5 11:03:33 2025] smb2_verify_signature+0x178/0x290 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_verify_signature+0x10/0x10 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? look_up_lock_class+0x5d/0x140
>>[Tue Aug 5 11:03:33 2025] smb2_check_receive+0x154/0x1c0 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_check_receive+0x10/0x10 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
>>[Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
>>[Tue Aug 5 11:03:33 2025] smb2_writev_callback+0x1f2/0x870 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? lock_acquire+0x143/0x2d0
>>[Tue Aug 5 11:03:33 2025] ? mid_execute_callback+0x19/0x40 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? do_raw_spin_lock+0x10c/0x190
>>[Tue Aug 5 11:03:33 2025] ? __pfx_do_raw_spin_lock+0x10/0x10
>>[Tue Aug 5 11:03:33 2025] ? _raw_spin_unlock+0x23/0x40
>>[Tue Aug 5 11:03:33 2025] mid_execute_callback+0x33/0x40 [cifs]
>>[Tue Aug 5 11:03:33 2025] cifs_demultiplex_thread+0xc95/0x15e0 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
>>[Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
>>[Tue Aug 5 11:03:33 2025] ? __kthread_parkme+0x4b/0xd0
>>[Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
>>[Tue Aug 5 11:03:33 2025] kthread+0x216/0x3e0
>>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>>[Tue Aug 5 11:03:33 2025] ? lock_release+0xc4/0x270
>>[Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
>>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>>[Tue Aug 5 11:03:33 2025] ret_from_fork+0x23a/0x2e0
>>[Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>>[Tue Aug 5 11:03:33 2025] ret_from_fork_asm+0x1a/0x30
>>[Tue Aug 5 11:03:33 2025] </TASK>
>>
>>(it worked without the patch see e.g.
>>http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
>>and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)
>>
>>On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
>><wangzhaolong@huaweicloud.com> wrote:
>>>
>>>I've been investigating a pretty nasty memory leak in the SMB client. When
>>>compound requests get interrupted by signals, we end up with mid_q_entry
>>>structures and server buffers that never get freed[1].
>>>
>>>User foreground process                    cifsd
>>>cifs_readdir
>>> open_cached_dir
>>>  cifs_send_recv
>>>   compound_send_recv
>>>    smb2_setup_request
>>>     smb2_mid_entry_alloc
>>>      smb2_get_mid_entry
>>>       smb2_mid_entry_alloc
>>>        mempool_alloc // alloc mid
>>>        kref_init(&temp->refcount); // refcount = 1
>>>     mid[0]->callback = cifs_compound_callback;
>>>     mid[1]->callback = cifs_compound_last_callback;
>>>     smb_send_rqst
>>>     rc = wait_for_response
>>>      wait_event_state TASK_KILLABLE
>>>                                  cifs_demultiplex_thread
>>>                                    allocate_buffers
>>>                                      server->bigbuf = cifs_buf_get()
>>>                                    standard_receive3
>>>                                      ->find_mid()
>>>                                        smb2_find_mid
>>>                                          __smb2_find_mid
>>>                                           kref_get(&mid->refcount) // +1
>>>                                      cifs_handle_standard
>>>                                        handle_mid
>>>                                         /* bigbuf will also leak */
>>>                                         mid->resp_buf = server->bigbuf
>>>                                         server->bigbuf = NULL;
>>>                                         dequeue_mid
>>>                                     /* in for loop */
>>>                                    mids[0]->callback
>>>                                      cifs_compound_callback
>>>    /* Signal interrupts wait: rc = -ERESTARTSYS */
>>>    /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
>>>    midQ[0]->callback = cifs_cancelled_callback;
>>>    cancelled_mid[i] = true;
>>>                                       /* The change comes too late */
>>>                                       mid->mid_state = MID_RESPONSE_READY
>>>                                    release_mid  // -1
>>>    /* cancelled_mid[i] == true causes mid won't be released
>>>       in compound_send_recv cleanup */
>>>    /* cifs_cancelled_callback won't executed to release mid */
>>>
>>>The core issue is a race condition where cifs_cancelled_callback never
>>>gets a chance to run, so cleanup never happens. I've spent quite a bit
>>>of time trying to understand how to fix this safely.
>>>
>>>Honestly, my first instinct was to just patch the callback assignment
>>>logic directly. But the more I dug into it, the more I realized that
>>>the current locking scheme makes this really tricky to do safely. We
>>>have one big lock protecting multiple different things, and trying to
>>>fix the race condition directly felt like playing with fire.
>>>
>>>I kept running into scenarios where a "simple" fix could introduce
>>>deadlocks or new race conditions. After looking at this from different
>>>angles, I came to the conclusion that I needed to refactor the locking
>>>first to create a safe foundation for the actual fix.
>>>
>>>Patches 1-3 are foundational refactoring. These three patches rename
>>>locks for clarity, separate counter protection from queue operations,
>>>and replace the confusing mid_flags bitmask with explicit boolean
>>>fields. Basically, they untangle the current locking mess so I can
>>>implement the real fix without breaking anything.
>>>
>>>The 4th patch in the series is where the real fix happens. With
>>>the previous refactoring in place, I could safely add a lock to each
>>>mid_q_entry and implement atomic callback execution. This eliminates
>>>the race condition that was causing the leaks.
>>>
>>>In summary, my approach to the fix is to use smaller-grained locking to
>>>avoid race conditions. However, during the implementation process,
>>>this approach involves more changes than I initially hoped for. If
>>>there's a simpler or more elegant way to fix this race condition that
>>>I've missed, I'd love to hear about it. I've tried to be thorough in
>>>my analysis, but I know there are folks with more experience in this
>>>codebase who might see a better path.
>>>
>>>V1 -> V2:
>>>  - Inline the mid_execute_callback() in the smb2ops.c to eliminate
>>>    the sparse warning.
>>>
>>>Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 [1]
>>>
>>>Wang Zhaolong (4):
>>>  smb: client: rename server mid_lock to mid_queue_lock
>>>  smb: client: add mid_counter_lock to protect the mid counter counter
>>>  smb: client: smb: client: eliminate mid_flags field
>>>  smb: client: fix mid_q_entry memleak leak with per-mid locking
>>>
>>> fs/smb/client/cifs_debug.c    | 12 ++++--
>>> fs/smb/client/cifsglob.h      | 22 ++++++-----
>>> fs/smb/client/connect.c       | 57 +++++++++++++++++----------
>>> fs/smb/client/smb1ops.c       | 23 +++++++----
>>> fs/smb/client/smb2ops.c       | 72 +++++++++++++++++++----------------
>>> fs/smb/client/smb2transport.c |  5 ++-
>>> fs/smb/client/transport.c     | 71 ++++++++++++++++++----------------
>>> 7 files changed, 152 insertions(+), 110 deletions(-)
>>>
>>>--
>>>2.39.2
>>>
>>>
>>
>>
>>-- 
>>Thanks,
>>
>>Steve
>>
>

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Wang Zhaolong 2 months ago




> The first three patches (cleanup) look fine and have added to
> cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
> ("smb: client: fix mid_q_entry memleak leak with per-mid locking")
> causes xfstest generic/001 to fail with signing enabled.  See
> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
> and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio
> 

I am unable to view any information in the link above. Is this information
only visible to logged-in users?


> 
> [Tue Aug 5 11:03:32 2025] run fstests generic/001 at 2025-08-05 11:03:32
> [Tue Aug 5 11:03:33 2025] =============================
> [Tue Aug 5 11:03:33 2025] [ BUG: Invalid wait context ]
> [Tue Aug 5 11:03:33 2025] 6.16.0 #1 Tainted: G E
> [Tue Aug 5 11:03:33 2025] -----------------------------
> [Tue Aug 5 11:03:33 2025] cifsd/24912 is trying to lock:
> [Tue Aug 5 11:03:33 2025] ffffffffafc14630
> (crypto_alg_sem){++++}-{4:4}, at: crypto_alg_lookup+0x40/0x120
> [Tue Aug 5 11:03:33 2025] other info that might help us debug this:
> [Tue Aug 5 11:03:33 2025] context-{5:5}
> [Tue Aug 5 11:03:33 2025] 1 lock held by cifsd/24912:
> [Tue Aug 5 11:03:33 2025] #0: ff11000134c25870
> (&temp->mid_lock){+.+.}-{3:3}, at: mid_execute_callback+0x19/0x40
> [cifs]
> [Tue Aug 5 11:03:33 2025] stack backtrace:
> [Tue Aug 5 11:03:33 2025] CPU: 1 UID: 0 PID: 24912 Comm: cifsd
> Tainted: G E 6.16.0 #1 PREEMPT(voluntary)
> [Tue Aug 5 11:03:33 2025] Tainted: [E]=UNSIGNED_MODULE
> [Tue Aug 5 11:03:33 2025] Hardware name: Red Hat KVM, BIOS
> 1.16.3-4.el9 04/01/2014
> [Tue Aug 5 11:03:33 2025] Call Trace:
> [Tue Aug 5 11:03:33 2025] <TASK>
> [Tue Aug 5 11:03:33 2025] dump_stack_lvl+0x79/0xb0
> [Tue Aug 5 11:03:33 2025] __lock_acquire+0xace/0x21c0
> [Tue Aug 5 11:03:33 2025] ? check_irq_usage+0xa4/0xa80
> [Tue Aug 5 11:03:33 2025] lock_acquire+0x143/0x2d0
> [Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
> [Tue Aug 5 11:03:33 2025] ? check_noncircular+0x71/0x120
> [Tue Aug 5 11:03:33 2025] down_read+0x7c/0x2e0
> [Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
> [Tue Aug 5 11:03:33 2025] ? __pfx_down_read+0x10/0x10
> [Tue Aug 5 11:03:33 2025] ? lockdep_unlock+0x51/0xc0
> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x11ee/0x21c0
> [Tue Aug 5 11:03:33 2025] crypto_alg_lookup+0x40/0x120
> [Tue Aug 5 11:03:33 2025] crypto_alg_mod_lookup+0x53/0x2b0
> [Tue Aug 5 11:03:33 2025] crypto_alloc_tfm_node+0x76/0x130
> [Tue Aug 5 11:03:33 2025] cifs_alloc_hash+0x44/0x130 [cifs]
> [Tue Aug 5 11:03:33 2025] smb3_calc_signature+0x4f0/0x7b0 [cifs]
> [Tue Aug 5 11:03:33 2025] ? __pfx_smb3_calc_signature+0x10/0x10 [cifs]
> [Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
> [Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
> [Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
> [Tue Aug 5 11:03:33 2025] ? trace_irq_enable.constprop.0+0xac/0xe0
> [Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
> [Tue Aug 5 11:03:33 2025] ? __local_bh_enable_ip+0x90/0xf0
> [Tue Aug 5 11:03:33 2025] ? sock_has_perm+0x97/0x1a0
> [Tue Aug 5 11:03:33 2025] smb2_verify_signature+0x178/0x290 [cifs]
> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_verify_signature+0x10/0x10 [cifs]
> [Tue Aug 5 11:03:33 2025] ? look_up_lock_class+0x5d/0x140
> [Tue Aug 5 11:03:33 2025] smb2_check_receive+0x154/0x1c0 [cifs]
> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_check_receive+0x10/0x10 [cifs]
> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
> [Tue Aug 5 11:03:33 2025] smb2_writev_callback+0x1f2/0x870 [cifs]
> [Tue Aug 5 11:03:33 2025] ? lock_acquire+0x143/0x2d0
> [Tue Aug 5 11:03:33 2025] ? mid_execute_callback+0x19/0x40 [cifs]
> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
> [Tue Aug 5 11:03:33 2025] ? do_raw_spin_lock+0x10c/0x190
> [Tue Aug 5 11:03:33 2025] ? __pfx_do_raw_spin_lock+0x10/0x10
> [Tue Aug 5 11:03:33 2025] ? _raw_spin_unlock+0x23/0x40
> [Tue Aug 5 11:03:33 2025] mid_execute_callback+0x33/0x40 [cifs]
> [Tue Aug 5 11:03:33 2025] cifs_demultiplex_thread+0xc95/0x15e0 [cifs]
> [Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> [Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
> [Tue Aug 5 11:03:33 2025] ? __kthread_parkme+0x4b/0xd0
> [Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> [Tue Aug 5 11:03:33 2025] kthread+0x216/0x3e0
> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> [Tue Aug 5 11:03:33 2025] ? lock_release+0xc4/0x270
> [Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> [Tue Aug 5 11:03:33 2025] ret_from_fork+0x23a/0x2e0
> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> [Tue Aug 5 11:03:33 2025] ret_from_fork_asm+0x1a/0x30
> [Tue Aug 5 11:03:33 2025] </TASK>
> 
> (it worked without the patch see e.g.
> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
> and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)
> 
> On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
> <wangzhaolong@huaweicloud.com> wrote:


It's quite strange that the lock reported in the stack trace is an internal
lock of the crypto module, which only protects the internal logic of crypto.
Moreover, I have not yet found a path where the callback for cifs registration
is executed within the scope of this lock.

```c
// crypto/api.c
static struct crypto_alg *crypto_alg_lookup(const char *name, u32 type,
					    u32 mask)
{
	const u32 fips = CRYPTO_ALG_FIPS_INTERNAL;
	struct crypto_alg *alg;
	u32 test = 0;

	if (!((type | mask) & CRYPTO_ALG_TESTED))
		test |= CRYPTO_ALG_TESTED;

	down_read(&crypto_alg_sem);
	...
	up_read(&crypto_alg_sem);
	return alg;
```
More information is needed to confirm this issue. Could you please provide it?

Best regards,
Wang Zhaolong

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Wang Zhaolong 1 month, 4 weeks ago

Sorry for the delayed response. I can see exactly what went wrong now.

The issue is that my implementation holds a spinlock (mid_lock) while
executing the callback, but the callback path can eventually lead to
crypto_alg_lookup() which tries to acquire a semaphore. This violates
the kernel's locking rules - we cannot sleep while holding a spinlock.

Perhaps I should consider a more ingenious solution that can safely
handle these cross-subsystem interactions.

I'll rework the patch to fix this locking issue and send a v3. I'll
probably need to rethink the whole locking strategy to be more aware
of what the callbacks actually do and what they might need to sleep for.

Best regards,
Wang Zhaolong


> 
>> The first three patches (cleanup) look fine and have added to
>> cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
>> ("smb: client: fix mid_q_entry memleak leak with per-mid locking")
>> causes xfstest generic/001 to fail with signing enabled.  See
>> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
>> and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio
>>
> 
> I am unable to view any information in the link above. Is this information
> only visible to logged-in users?
> 
> 
>>
>> [Tue Aug 5 11:03:32 2025] run fstests generic/001 at 2025-08-05 11:03:32
>> [Tue Aug 5 11:03:33 2025] =============================
>> [Tue Aug 5 11:03:33 2025] [ BUG: Invalid wait context ]
>> [Tue Aug 5 11:03:33 2025] 6.16.0 #1 Tainted: G E
>> [Tue Aug 5 11:03:33 2025] -----------------------------
>> [Tue Aug 5 11:03:33 2025] cifsd/24912 is trying to lock:
>> [Tue Aug 5 11:03:33 2025] ffffffffafc14630
>> (crypto_alg_sem){++++}-{4:4}, at: crypto_alg_lookup+0x40/0x120
>> [Tue Aug 5 11:03:33 2025] other info that might help us debug this:
>> [Tue Aug 5 11:03:33 2025] context-{5:5}
>> [Tue Aug 5 11:03:33 2025] 1 lock held by cifsd/24912:
>> [Tue Aug 5 11:03:33 2025] #0: ff11000134c25870
>> (&temp->mid_lock){+.+.}-{3:3}, at: mid_execute_callback+0x19/0x40
>> [cifs]
>> [Tue Aug 5 11:03:33 2025] stack backtrace:
>> [Tue Aug 5 11:03:33 2025] CPU: 1 UID: 0 PID: 24912 Comm: cifsd
>> Tainted: G E 6.16.0 #1 PREEMPT(voluntary)
>> [Tue Aug 5 11:03:33 2025] Tainted: [E]=UNSIGNED_MODULE
>> [Tue Aug 5 11:03:33 2025] Hardware name: Red Hat KVM, BIOS
>> 1.16.3-4.el9 04/01/2014
>> [Tue Aug 5 11:03:33 2025] Call Trace:
>> [Tue Aug 5 11:03:33 2025] <TASK>
>> [Tue Aug 5 11:03:33 2025] dump_stack_lvl+0x79/0xb0
>> [Tue Aug 5 11:03:33 2025] __lock_acquire+0xace/0x21c0
>> [Tue Aug 5 11:03:33 2025] ? check_irq_usage+0xa4/0xa80
>> [Tue Aug 5 11:03:33 2025] lock_acquire+0x143/0x2d0
>> [Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
>> [Tue Aug 5 11:03:33 2025] ? check_noncircular+0x71/0x120
>> [Tue Aug 5 11:03:33 2025] down_read+0x7c/0x2e0
>> [Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
>> [Tue Aug 5 11:03:33 2025] ? __pfx_down_read+0x10/0x10
>> [Tue Aug 5 11:03:33 2025] ? lockdep_unlock+0x51/0xc0
>> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x11ee/0x21c0
>> [Tue Aug 5 11:03:33 2025] crypto_alg_lookup+0x40/0x120
>> [Tue Aug 5 11:03:33 2025] crypto_alg_mod_lookup+0x53/0x2b0
>> [Tue Aug 5 11:03:33 2025] crypto_alloc_tfm_node+0x76/0x130
>> [Tue Aug 5 11:03:33 2025] cifs_alloc_hash+0x44/0x130 [cifs]
>> [Tue Aug 5 11:03:33 2025] smb3_calc_signature+0x4f0/0x7b0 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? __pfx_smb3_calc_signature+0x10/0x10 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
>> [Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
>> [Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
>> [Tue Aug 5 11:03:33 2025] ? trace_irq_enable.constprop.0+0xac/0xe0
>> [Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
>> [Tue Aug 5 11:03:33 2025] ? __local_bh_enable_ip+0x90/0xf0
>> [Tue Aug 5 11:03:33 2025] ? sock_has_perm+0x97/0x1a0
>> [Tue Aug 5 11:03:33 2025] smb2_verify_signature+0x178/0x290 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_verify_signature+0x10/0x10 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? look_up_lock_class+0x5d/0x140
>> [Tue Aug 5 11:03:33 2025] smb2_check_receive+0x154/0x1c0 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_check_receive+0x10/0x10 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
>> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
>> [Tue Aug 5 11:03:33 2025] smb2_writev_callback+0x1f2/0x870 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? lock_acquire+0x143/0x2d0
>> [Tue Aug 5 11:03:33 2025] ? mid_execute_callback+0x19/0x40 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? do_raw_spin_lock+0x10c/0x190
>> [Tue Aug 5 11:03:33 2025] ? __pfx_do_raw_spin_lock+0x10/0x10
>> [Tue Aug 5 11:03:33 2025] ? _raw_spin_unlock+0x23/0x40
>> [Tue Aug 5 11:03:33 2025] mid_execute_callback+0x33/0x40 [cifs]
>> [Tue Aug 5 11:03:33 2025] cifs_demultiplex_thread+0xc95/0x15e0 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
>> [Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
>> [Tue Aug 5 11:03:33 2025] ? __kthread_parkme+0x4b/0xd0
>> [Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
>> [Tue Aug 5 11:03:33 2025] kthread+0x216/0x3e0
>> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>> [Tue Aug 5 11:03:33 2025] ? lock_release+0xc4/0x270
>> [Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
>> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>> [Tue Aug 5 11:03:33 2025] ret_from_fork+0x23a/0x2e0
>> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
>> [Tue Aug 5 11:03:33 2025] ret_from_fork_asm+0x1a/0x30
>> [Tue Aug 5 11:03:33 2025] </TASK>
>>
>> (it worked without the patch see e.g.
>> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
>> and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)
>>
>> On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
>> <wangzhaolong@huaweicloud.com> wrote:
> 
> 
> It's quite strange that the lock reported in the stack trace is an internal
> lock of the crypto module, which only protects the internal logic of crypto.
> Moreover, I have not yet found a path where the callback for cifs registration
> is executed within the scope of this lock.
> 
> ```c
> // crypto/api.c
> static struct crypto_alg *crypto_alg_lookup(const char *name, u32 type,
>                          u32 mask)
> {
>      const u32 fips = CRYPTO_ALG_FIPS_INTERNAL;
>      struct crypto_alg *alg;
>      u32 test = 0;
> 
>      if (!((type | mask) & CRYPTO_ALG_TESTED))
>          test |= CRYPTO_ALG_TESTED;
> 
>      down_read(&crypto_alg_sem);
>      ...
>      up_read(&crypto_alg_sem);
>      return alg;
> ```
> More information is needed to confirm this issue. Could you please provide it?
> 
> Best regards,
> Wang Zhaolong
>

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Steve French 1 month, 4 weeks ago

presumably the first three cleanup are ok - but if objections let me know

On Thu, Aug 7, 2025 at 9:43 AM Wang Zhaolong
<wangzhaolong@huaweicloud.com> wrote:
>
>
> Sorry for the delayed response. I can see exactly what went wrong now.
>
> The issue is that my implementation holds a spinlock (mid_lock) while
> executing the callback, but the callback path can eventually lead to
> crypto_alg_lookup() which tries to acquire a semaphore. This violates
> the kernel's locking rules - we cannot sleep while holding a spinlock.
>
> Perhaps I should consider a more ingenious solution that can safely
> handle these cross-subsystem interactions.
>
> I'll rework the patch to fix this locking issue and send a v3. I'll
> probably need to rethink the whole locking strategy to be more aware
> of what the callbacks actually do and what they might need to sleep for.
>
> Best regards,
> Wang Zhaolong
>
>
> >
> >> The first three patches (cleanup) look fine and have added to
> >> cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
> >> ("smb: client: fix mid_q_entry memleak leak with per-mid locking")
> >> causes xfstest generic/001 to fail with signing enabled.  See
> >> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
> >> and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio
> >>
> >
> > I am unable to view any information in the link above. Is this information
> > only visible to logged-in users?
> >
> >
> >>
> >> [Tue Aug 5 11:03:32 2025] run fstests generic/001 at 2025-08-05 11:03:32
> >> [Tue Aug 5 11:03:33 2025] =============================
> >> [Tue Aug 5 11:03:33 2025] [ BUG: Invalid wait context ]
> >> [Tue Aug 5 11:03:33 2025] 6.16.0 #1 Tainted: G E
> >> [Tue Aug 5 11:03:33 2025] -----------------------------
> >> [Tue Aug 5 11:03:33 2025] cifsd/24912 is trying to lock:
> >> [Tue Aug 5 11:03:33 2025] ffffffffafc14630
> >> (crypto_alg_sem){++++}-{4:4}, at: crypto_alg_lookup+0x40/0x120
> >> [Tue Aug 5 11:03:33 2025] other info that might help us debug this:
> >> [Tue Aug 5 11:03:33 2025] context-{5:5}
> >> [Tue Aug 5 11:03:33 2025] 1 lock held by cifsd/24912:
> >> [Tue Aug 5 11:03:33 2025] #0: ff11000134c25870
> >> (&temp->mid_lock){+.+.}-{3:3}, at: mid_execute_callback+0x19/0x40
> >> [cifs]
> >> [Tue Aug 5 11:03:33 2025] stack backtrace:
> >> [Tue Aug 5 11:03:33 2025] CPU: 1 UID: 0 PID: 24912 Comm: cifsd
> >> Tainted: G E 6.16.0 #1 PREEMPT(voluntary)
> >> [Tue Aug 5 11:03:33 2025] Tainted: [E]=UNSIGNED_MODULE
> >> [Tue Aug 5 11:03:33 2025] Hardware name: Red Hat KVM, BIOS
> >> 1.16.3-4.el9 04/01/2014
> >> [Tue Aug 5 11:03:33 2025] Call Trace:
> >> [Tue Aug 5 11:03:33 2025] <TASK>
> >> [Tue Aug 5 11:03:33 2025] dump_stack_lvl+0x79/0xb0
> >> [Tue Aug 5 11:03:33 2025] __lock_acquire+0xace/0x21c0
> >> [Tue Aug 5 11:03:33 2025] ? check_irq_usage+0xa4/0xa80
> >> [Tue Aug 5 11:03:33 2025] lock_acquire+0x143/0x2d0
> >> [Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
> >> [Tue Aug 5 11:03:33 2025] ? check_noncircular+0x71/0x120
> >> [Tue Aug 5 11:03:33 2025] down_read+0x7c/0x2e0
> >> [Tue Aug 5 11:03:33 2025] ? crypto_alg_lookup+0x40/0x120
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_down_read+0x10/0x10
> >> [Tue Aug 5 11:03:33 2025] ? lockdep_unlock+0x51/0xc0
> >> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x11ee/0x21c0
> >> [Tue Aug 5 11:03:33 2025] crypto_alg_lookup+0x40/0x120
> >> [Tue Aug 5 11:03:33 2025] crypto_alg_mod_lookup+0x53/0x2b0
> >> [Tue Aug 5 11:03:33 2025] crypto_alloc_tfm_node+0x76/0x130
> >> [Tue Aug 5 11:03:33 2025] cifs_alloc_hash+0x44/0x130 [cifs]
> >> [Tue Aug 5 11:03:33 2025] smb3_calc_signature+0x4f0/0x7b0 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_smb3_calc_signature+0x10/0x10 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
> >> [Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
> >> [Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
> >> [Tue Aug 5 11:03:33 2025] ? trace_irq_enable.constprop.0+0xac/0xe0
> >> [Tue Aug 5 11:03:33 2025] ? tcp_recvmsg+0xc9/0x2d0
> >> [Tue Aug 5 11:03:33 2025] ? __local_bh_enable_ip+0x90/0xf0
> >> [Tue Aug 5 11:03:33 2025] ? sock_has_perm+0x97/0x1a0
> >> [Tue Aug 5 11:03:33 2025] smb2_verify_signature+0x178/0x290 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_verify_signature+0x10/0x10 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? look_up_lock_class+0x5d/0x140
> >> [Tue Aug 5 11:03:33 2025] smb2_check_receive+0x154/0x1c0 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_check_receive+0x10/0x10 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
> >> [Tue Aug 5 11:03:33 2025] ? __lock_acquire+0x3f1/0x21c0
> >> [Tue Aug 5 11:03:33 2025] smb2_writev_callback+0x1f2/0x870 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? lock_acquire+0x143/0x2d0
> >> [Tue Aug 5 11:03:33 2025] ? mid_execute_callback+0x19/0x40 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? do_raw_spin_lock+0x10c/0x190
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_do_raw_spin_lock+0x10/0x10
> >> [Tue Aug 5 11:03:33 2025] ? _raw_spin_unlock+0x23/0x40
> >> [Tue Aug 5 11:03:33 2025] mid_execute_callback+0x33/0x40 [cifs]
> >> [Tue Aug 5 11:03:33 2025] cifs_demultiplex_thread+0xc95/0x15e0 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> >> [Tue Aug 5 11:03:33 2025] ? find_held_lock+0x2b/0x80
> >> [Tue Aug 5 11:03:33 2025] ? __kthread_parkme+0x4b/0xd0
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> >> [Tue Aug 5 11:03:33 2025] kthread+0x216/0x3e0
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> >> [Tue Aug 5 11:03:33 2025] ? lock_release+0xc4/0x270
> >> [Tue Aug 5 11:03:33 2025] ? rcu_is_watching+0x20/0x50
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> >> [Tue Aug 5 11:03:33 2025] ret_from_fork+0x23a/0x2e0
> >> [Tue Aug 5 11:03:33 2025] ? __pfx_kthread+0x10/0x10
> >> [Tue Aug 5 11:03:33 2025] ret_from_fork_asm+0x1a/0x30
> >> [Tue Aug 5 11:03:33 2025] </TASK>
> >>
> >> (it worked without the patch see e.g.
> >> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
> >> and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)
> >>
> >> On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
> >> <wangzhaolong@huaweicloud.com> wrote:
> >
> >
> > It's quite strange that the lock reported in the stack trace is an internal
> > lock of the crypto module, which only protects the internal logic of crypto.
> > Moreover, I have not yet found a path where the callback for cifs registration
> > is executed within the scope of this lock.
> >
> > ```c
> > // crypto/api.c
> > static struct crypto_alg *crypto_alg_lookup(const char *name, u32 type,
> >                          u32 mask)
> > {
> >      const u32 fips = CRYPTO_ALG_FIPS_INTERNAL;
> >      struct crypto_alg *alg;
> >      u32 test = 0;
> >
> >      if (!((type | mask) & CRYPTO_ALG_TESTED))
> >          test |= CRYPTO_ALG_TESTED;
> >
> >      down_read(&crypto_alg_sem);
> >      ...
> >      up_read(&crypto_alg_sem);
> >      return alg;
> > ```
> > More information is needed to confirm this issue. Could you please provide it?
> >
> > Best regards,
> > Wang Zhaolong
> >
>
>
>
>


-- 
Thanks,

Steve

Re: [PATCH V2 0/4] Fix mid_q_entry memory leaks in SMB client

Posted by Enzo Matsumiya 1 month, 4 weeks ago

On 08/06, Wang Zhaolong wrote:
>>The first three patches (cleanup) look fine and have added to
>>cifs-2.6.git for-next (also added Enzo Acked-by) but the fourth patch
>>("smb: client: fix mid_q_entry memleak leak with per-mid locking")
>>causes xfstest generic/001 to fail with signing enabled.  See
>>http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/58/steps/34/logs/stdio
>>and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/59/steps/34/logs/stdio
>>
>
>I am unable to view any information in the link above. Is this information
>only visible to logged-in users?

That one is publicly visible.

If you're using a Chrome-based browser, you might need to whitelist the
website though as it doesn't use HTTPS.

> ...
>>
>>(it worked without the patch see e.g.
>>http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/60
>>and http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/56)
>>
>>On Tue, Aug 5, 2025 at 1:54 AM Wang Zhaolong
>><wangzhaolong@huaweicloud.com> wrote:
>
>
>It's quite strange that the lock reported in the stack trace is an internal
>lock of the crypto module, which only protects the internal logic of crypto.
>Moreover, I have not yet found a path where the callback for cifs registration
>is executed within the scope of this lock.
>
>```c
>// crypto/api.c
>static struct crypto_alg *crypto_alg_lookup(const char *name, u32 type,
>					    u32 mask)
>{
>	const u32 fips = CRYPTO_ALG_FIPS_INTERNAL;
>	struct crypto_alg *alg;
>	u32 test = 0;
>
>	if (!((type | mask) & CRYPTO_ALG_TESTED))
>		test |= CRYPTO_ALG_TESTED;
>
>	down_read(&crypto_alg_sem);
>	...
>	up_read(&crypto_alg_sem);
>	return alg;
>```
>More information is needed to confirm this issue. Could you please provide it?

In summary the problem is in mid_execute_callback() when callback is
smb2_writev_callback.

Cleaning it up for clarity:

---- begin ----
     cifsd/24912 is trying to lock crypto_alg_sem at crypto_alg_lookup+0x40/0x120
     cifsd/24912 is holding &temp->mid_lock at mid_execute_callback+0x19/0x40

     Reversed call trace:
     cifs_demultiplex_thread
       mid_execute_callback
         <lock mid_lock>
         smb2_writev_callback
           smb2_check_receive
             smb2_verify_signature
               smb3_calc_signature
                 cifs_alloc_hash
                   crypto_alloc_tfm_node
                     crypto_alg_mod_lookup
                       crypto_alg_lookup
                         down_read
                           lock_acquire
                             __lock_acquire
                               >>> BUG() here <<<
         ...
         <unlock mid_lock>
---- end ----

Before the patch I sent you privately yesterday, I confirmed this
by locking mid_lock directly only for cifs_compound*callback and
cifs_wake_up_task.

Also you need to make sure to use a reproducer/tester that uses writev,
which your poc.c from original bug report doesn't.

HTH


Cheers,

Enzo