[v2] tty: tty_jobctrl: fix pid memleak in disassociate_ctty()

[PATCH v2] tty: tty_jobctrl: fix pid memleak in disassociate_ctty()

Posted by Yi Yang 2 years, 6 months ago

There is a pid leakage:
------------------------------
unreferenced object 0xffff88810c181940 (size 224):
  comm "sshd", pid 8191, jiffies 4294946950 (age 524.570s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 ad 4e ad de  .............N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<ffffffff814774e6>] kmem_cache_alloc+0x5c6/0x9b0
    [<ffffffff81177342>] alloc_pid+0x72/0x570
    [<ffffffff81140ac4>] copy_process+0x1374/0x2470
    [<ffffffff81141d77>] kernel_clone+0xb7/0x900
    [<ffffffff81142645>] __se_sys_clone+0x85/0xb0
    [<ffffffff8114269b>] __x64_sys_clone+0x2b/0x30
    [<ffffffff83965a72>] do_syscall_64+0x32/0x80
    [<ffffffff83a00085>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

It turns out that there is a race condition between disassociate_ctty() and
tty_signal_session_leader(), which caused this leakage.

The pid memleak is triggered by the following race:
task[sshd]                     task[bash]
-----------------------        -----------------------
                               disassociate_ctty();
                               spin_lock_irq(&current->sighand->siglock);
                               put_pid(current->signal->tty_old_pgrp);
                               current->signal->tty_old_pgrp = NULL;
                               tty = tty_kref_get(current->signal->tty);
                               spin_unlock_irq(&current->sighand->siglock);
tty_vhangup();
tty_lock(tty);
...
tty_signal_session_leader();
spin_lock_irq(&p->sighand->siglock);
...
if (tty->pgrp) //tty->pgrp is not NULL
p->signal->tty_old_pgrp = get_pid(tty->pgrp); //An extra get
spin_unlock_irq(&p->sighand->siglock);
...
tty_unlock(tty);
                               if (tty) {
                                   tty_lock(tty);
                                   ...
                                   put_pid(tty->pgrp);
                                   tty->pgrp = NULL; // It's too late
                                   ...
                                   tty_unlock(tty);
                               }

The issue is believed to be introduced by commit c8bcd9c5be24 ("tty:
Fix ->session locking") who moves the unlock of siglock in
disassociate_ctty() above "if (tty)", making a small window allowing
tty_signal_session_leader() to kick in. It can be easily reproduced by
adding a delay before "if (tty)" and at the entrance of
tty_signal_session_leader() "tty_signal_session_leader()".

To fix this issue, we move put_pid() after "if (tty)".

Fixes: c8bcd9c5be24 ("tty: Fix ->session locking")
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Co-developed-by: GUO Zihua <guozihua@huawei.com>
Signed-off-by: GUO Zihua <guozihua@huawei.com>
---
v2:Completely refactor the solution, avoid the use of PF_EXITING flag and
do put_pid() in disassociate_ctty() again instead.
---
 drivers/tty/tty_jobctrl.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 0d04287da098..17a6565f428b 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -300,12 +300,7 @@ void disassociate_ctty(int on_exit)
 		return;
 	}
 
-	spin_lock_irq(&current->sighand->siglock);
-	put_pid(current->signal->tty_old_pgrp);
-	current->signal->tty_old_pgrp = NULL;
-	tty = tty_kref_get(current->signal->tty);
-	spin_unlock_irq(&current->sighand->siglock);
-
+	tty = get_current_tty();
 	if (tty) {
 		unsigned long flags;
 
@@ -320,6 +315,11 @@ void disassociate_ctty(int on_exit)
 		tty_kref_put(tty);
 	}
 
+	spin_lock_irq(&current->sighand->siglock);
+	put_pid(current->signal->tty_old_pgrp);
+	current->signal->tty_old_pgrp = NULL;
+	spin_unlock_irq(&current->sighand->siglock);
+
 	/* Now clear signal->tty under the lock */
 	read_lock(&tasklist_lock);
 	session_clear_tty(task_session(current));
-- 
2.17.1

Re: [PATCH v2] tty: tty_jobctrl: fix pid memleak in disassociate_ctty()

Posted by Jiri Slaby 2 years, 6 months ago

On 24. 07. 23, 5:37, Yi Yang wrote:
> There is a pid leakage:
> ------------------------------
> unreferenced object 0xffff88810c181940 (size 224):
>    comm "sshd", pid 8191, jiffies 4294946950 (age 524.570s)
>    hex dump (first 32 bytes):
>      01 00 00 00 00 00 00 00 00 00 00 00 ad 4e ad de  .............N..
>      ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
>    backtrace:
>      [<ffffffff814774e6>] kmem_cache_alloc+0x5c6/0x9b0
>      [<ffffffff81177342>] alloc_pid+0x72/0x570
>      [<ffffffff81140ac4>] copy_process+0x1374/0x2470
>      [<ffffffff81141d77>] kernel_clone+0xb7/0x900
>      [<ffffffff81142645>] __se_sys_clone+0x85/0xb0
>      [<ffffffff8114269b>] __x64_sys_clone+0x2b/0x30
>      [<ffffffff83965a72>] do_syscall_64+0x32/0x80
>      [<ffffffff83a00085>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> It turns out that there is a race condition between disassociate_ctty() and
> tty_signal_session_leader(), which caused this leakage.
> 
> The pid memleak is triggered by the following race:
> task[sshd]                     task[bash]
> -----------------------        -----------------------
>                                 disassociate_ctty();
>                                 spin_lock_irq(&current->sighand->siglock);
>                                 put_pid(current->signal->tty_old_pgrp);
>                                 current->signal->tty_old_pgrp = NULL;
>                                 tty = tty_kref_get(current->signal->tty);
>                                 spin_unlock_irq(&current->sighand->siglock);
> tty_vhangup();
> tty_lock(tty);
> ...
> tty_signal_session_leader();
> spin_lock_irq(&p->sighand->siglock);
> ...
> if (tty->pgrp) //tty->pgrp is not NULL
> p->signal->tty_old_pgrp = get_pid(tty->pgrp); //An extra get
> spin_unlock_irq(&p->sighand->siglock);
> ...
> tty_unlock(tty);
>                                 if (tty) {
>                                     tty_lock(tty);
>                                     ...
>                                     put_pid(tty->pgrp);
>                                     tty->pgrp = NULL; // It's too late
>                                     ...
>                                     tty_unlock(tty);
>                                 }
> 
> The issue is believed to be introduced by commit c8bcd9c5be24 ("tty:
> Fix ->session locking") who moves the unlock of siglock in
> disassociate_ctty() above "if (tty)", making a small window allowing
> tty_signal_session_leader() to kick in. It can be easily reproduced by
> adding a delay before "if (tty)" and at the entrance of
> tty_signal_session_leader() "tty_signal_session_leader()".

Funny, the commit effectively reverted c70dbb1e79a1 ("tty: fix memleak 
in alloc_pid") which appears to be fixing exactly what you are reporting 
now again.

> To fix this issue, we move put_pid() after "if (tty)".
> 
> Fixes: c8bcd9c5be24 ("tty: Fix ->session locking")
> Signed-off-by: Yi Yang <yiyang13@huawei.com>
> Co-developed-by: GUO Zihua <guozihua@huawei.com>
> Signed-off-by: GUO Zihua <guozihua@huawei.com>
> ---
> v2:Completely refactor the solution, avoid the use of PF_EXITING flag and
> do put_pid() in disassociate_ctty() again instead.
> ---
>   drivers/tty/tty_jobctrl.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
> index 0d04287da098..17a6565f428b 100644
> --- a/drivers/tty/tty_jobctrl.c
> +++ b/drivers/tty/tty_jobctrl.c
> @@ -300,12 +300,7 @@ void disassociate_ctty(int on_exit)
>   		return;
>   	}
>   
> -	spin_lock_irq(&current->sighand->siglock);
> -	put_pid(current->signal->tty_old_pgrp);
> -	current->signal->tty_old_pgrp = NULL;
> -	tty = tty_kref_get(current->signal->tty);
> -	spin_unlock_irq(&current->sighand->siglock);
> -
> +	tty = get_current_tty();
>   	if (tty) {
>   		unsigned long flags;
>   
> @@ -320,6 +315,11 @@ void disassociate_ctty(int on_exit)
>   		tty_kref_put(tty);
>   	}
>   
> +	spin_lock_irq(&current->sighand->siglock);
> +	put_pid(current->signal->tty_old_pgrp);
> +	current->signal->tty_old_pgrp = NULL;
> +	spin_unlock_irq(&current->sighand->siglock);

It _appears_ to be correct (the locking of all this is quite hairy). But 
at the very least, this block deserves a comment why we do it the second 
time.

thanks,
-- 
js
suse labs

Re: [PATCH v2] tty: tty_jobctrl: fix pid memleak in disassociate_ctty()

Posted by Jann Horn 2 years, 6 months ago

On Mon, Jul 24, 2023 at 10:28 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> On 24. 07. 23, 5:37, Yi Yang wrote:
> > There is a pid leakage:
> > ------------------------------
> > unreferenced object 0xffff88810c181940 (size 224):
> >    comm "sshd", pid 8191, jiffies 4294946950 (age 524.570s)
> >    hex dump (first 32 bytes):
> >      01 00 00 00 00 00 00 00 00 00 00 00 ad 4e ad de  .............N..
> >      ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
> >    backtrace:
> >      [<ffffffff814774e6>] kmem_cache_alloc+0x5c6/0x9b0
> >      [<ffffffff81177342>] alloc_pid+0x72/0x570
> >      [<ffffffff81140ac4>] copy_process+0x1374/0x2470
> >      [<ffffffff81141d77>] kernel_clone+0xb7/0x900
> >      [<ffffffff81142645>] __se_sys_clone+0x85/0xb0
> >      [<ffffffff8114269b>] __x64_sys_clone+0x2b/0x30
> >      [<ffffffff83965a72>] do_syscall_64+0x32/0x80
> >      [<ffffffff83a00085>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> >
> > It turns out that there is a race condition between disassociate_ctty() and
> > tty_signal_session_leader(), which caused this leakage.
> >
> > The pid memleak is triggered by the following race:
> > task[sshd]                     task[bash]
> > -----------------------        -----------------------
> >                                 disassociate_ctty();
> >                                 spin_lock_irq(&current->sighand->siglock);
> >                                 put_pid(current->signal->tty_old_pgrp);
> >                                 current->signal->tty_old_pgrp = NULL;
> >                                 tty = tty_kref_get(current->signal->tty);
> >                                 spin_unlock_irq(&current->sighand->siglock);
> > tty_vhangup();
> > tty_lock(tty);
> > ...
> > tty_signal_session_leader();
> > spin_lock_irq(&p->sighand->siglock);
> > ...
> > if (tty->pgrp) //tty->pgrp is not NULL
> > p->signal->tty_old_pgrp = get_pid(tty->pgrp); //An extra get
> > spin_unlock_irq(&p->sighand->siglock);
> > ...
> > tty_unlock(tty);
> >                                 if (tty) {
> >                                     tty_lock(tty);
> >                                     ...
> >                                     put_pid(tty->pgrp);
> >                                     tty->pgrp = NULL; // It's too late
> >                                     ...
> >                                     tty_unlock(tty);
> >                                 }
> >
> > The issue is believed to be introduced by commit c8bcd9c5be24 ("tty:
> > Fix ->session locking") who moves the unlock of siglock in
> > disassociate_ctty() above "if (tty)", making a small window allowing
> > tty_signal_session_leader() to kick in. It can be easily reproduced by
> > adding a delay before "if (tty)" and at the entrance of
> > tty_signal_session_leader() "tty_signal_session_leader()".
>
> Funny, the commit effectively reverted c70dbb1e79a1 ("tty: fix memleak
> in alloc_pid") which appears to be fixing exactly what you are reporting
> now again.
>
> > To fix this issue, we move put_pid() after "if (tty)".
> >
> > Fixes: c8bcd9c5be24 ("tty: Fix ->session locking")
> > Signed-off-by: Yi Yang <yiyang13@huawei.com>
> > Co-developed-by: GUO Zihua <guozihua@huawei.com>
> > Signed-off-by: GUO Zihua <guozihua@huawei.com>
> > ---
> > v2:Completely refactor the solution, avoid the use of PF_EXITING flag and
> > do put_pid() in disassociate_ctty() again instead.
> > ---
> >   drivers/tty/tty_jobctrl.c | 12 ++++++------
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
> > index 0d04287da098..17a6565f428b 100644
> > --- a/drivers/tty/tty_jobctrl.c
> > +++ b/drivers/tty/tty_jobctrl.c
> > @@ -300,12 +300,7 @@ void disassociate_ctty(int on_exit)
> >               return;
> >       }
> >
> > -     spin_lock_irq(&current->sighand->siglock);
> > -     put_pid(current->signal->tty_old_pgrp);
> > -     current->signal->tty_old_pgrp = NULL;
> > -     tty = tty_kref_get(current->signal->tty);
> > -     spin_unlock_irq(&current->sighand->siglock);
> > -
> > +     tty = get_current_tty();
> >       if (tty) {
> >               unsigned long flags;
> >
> > @@ -320,6 +315,11 @@ void disassociate_ctty(int on_exit)
> >               tty_kref_put(tty);
> >       }
> >
> > +     spin_lock_irq(&current->sighand->siglock);
> > +     put_pid(current->signal->tty_old_pgrp);
> > +     current->signal->tty_old_pgrp = NULL;
> > +     spin_unlock_irq(&current->sighand->siglock);
>
> It _appears_ to be correct (the locking of all this is quite hairy). But
> at the very least, this block deserves a comment why we do it the second
> time.

What is "it" in "do it the second time"? Are you referring to calling
get_current_tty()?