[PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close

Marcel Ziswiler posted 2 patches 2 years, 2 months ago
[PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Marcel Ziswiler 2 years, 2 months ago
From: Marcel Ziswiler <marcel.ziswiler@toradex.com>

Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while
atomic. Fix this by properly purging the transmit queue and freeing the
receive skb.

Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets")

Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
---
This is the kernel trace this commit fixes:
[   29.270685] BUG: scheduling while atomic: kworker/u3:0/55/0x00000002
[   29.277062] Modules linked in: snd_soc_simple_card snd_soc_simple_card_utils crct10dif_ce btnxpuart usb_f_ncm u_ether rtc_ti_k3 k3_j72xx_bandgap mwifiex_sdio mwifiex snd_soc_davinci_mcasp snd_soc_ti_udma sa2ul snd_soc_ti_edma snd_soc_ti_sdma authenc ina2xx snd_soc_nau8822 tps65219_pwrbutton at24 ti_ads1015 industrialio_triggered_buffer kfifo_buf lm75 rtc_ds1307 spi_omap2_mcspi 8021q garp mrp stp llc cfg80211 bluetooth ecdh_generic ecc rfkill libcomposite fuse drm backlight ipv6
[   29.319532] CPU: 0 PID: 55 Comm: kworker/u3:0 Not tainted 6.6.0-rc5-next-20231010-dirty #35
[   29.327883] Hardware name: Toradex Verdin AM62 WB on Verdin Development Board (DT)
[   29.335450] Workqueue: hci0 hci_power_off [bluetooth]
[   29.340775] Call trace:
[   29.343220]  dump_backtrace+0x98/0x118
[   29.346984]  show_stack+0x18/0x24
[   29.350303]  dump_stack_lvl+0x48/0x60
[   29.353971]  dump_stack+0x18/0x24
[   29.357287]  __schedule_bug+0x50/0x68
[   29.360953]  __schedule+0x7f0/0xa94
[   29.364446]  schedule+0x34/0xc8
[   29.367588]  rpm_resume+0x170/0x6c8
[   29.371083]  __pm_runtime_resume+0x4c/0x90
[   29.375182]  omap8250_set_mctrl+0x2c/0xbc
[   29.379198]  serial8250_set_mctrl+0x20/0x40
[   29.383387]  uart_update_mctrl+0x58/0x78
[   29.387311]  uart_dtr_rts+0x104/0x114
[   29.390975]  tty_port_shutdown+0xd4/0xdc
[   29.394903]  tty_port_close+0x40/0xbc
[   29.398567]  uart_close+0x34/0x9c
[   29.401882]  ttyport_close+0x50/0x94
[   29.405460]  serdev_device_close+0x40/0x50
[   29.409557]  btnxpuart_close+0x48/0xbc [btnxpuart]
[   29.414364]  hci_dev_close_sync+0x2ec/0x754 [bluetooth]
[   29.419747]  hci_dev_do_close+0x2c/0x70 [bluetooth]
[   29.424772]  hci_power_off+0x20/0x64 [bluetooth]
[   29.429536]  process_one_work+0x138/0x244
[   29.433551]  worker_thread+0x320/0x438
[   29.437302]  kthread+0x114/0x118
[   29.440533]  ret_from_fork+0x10/0x20
[   29.445838] BUG: scheduling while atomic: kworker/u3:0/55/0x00000000
[   29.453556] Modules linked in: snd_soc_simple_card snd_soc_simple_card_utils crct10dif_ce btnxpuart usb_f_ncm u_ether rtc_ti_k3 k3_j72xx_bandgap mwifiex_sdio mwifiex snd_soc_davinci_mcasp snd_soc_ti_udma sa2ul snd_soc_ti_edma snd_soc_ti_sdma authenc ina2xx snd_soc_nau8822 tps65219_pwrbutton at24 ti_ads1015 industrialio_triggered_buffer kfifo_buf lm75 rtc_ds1307 spi_omap2_mcspi 8021q garp mrp stp llc cfg80211 bluetooth ecdh_generic ecc rfkill libcomposite fuse drm backlight ipv6
[   29.496085] CPU: 0 PID: 55 Comm: kworker/u3:0 Tainted: G        W          6.6.0-rc5-next-20231010-dirty #35
[   29.505912] Hardware name: Toradex Verdin AM62 WB on Verdin Development Board (DT)
[   29.513478] Workqueue: hci0 hci_power_off [bluetooth]
[   29.518787] Call trace:
[   29.521232]  dump_backtrace+0x98/0x118
[   29.524993]  show_stack+0x18/0x24
[   29.528312]  dump_stack_lvl+0x48/0x60
[   29.531980]  dump_stack+0x18/0x24
[   29.535296]  __schedule_bug+0x50/0x68
[   29.538961]  __schedule+0x7f0/0xa94
[   29.542452]  schedule+0x34/0xc8
[   29.545593]  rpm_resume+0x170/0x6c8
[   29.549087]  __pm_runtime_resume+0x4c/0x90
[   29.553186]  omap_8250_pm+0x28/0x110
[   29.556766]  serial8250_pm+0x18/0x3c
[   29.560347]  uart_tty_port_shutdown+0xa4/0x114
[   29.564792]  tty_port_shutdown+0x84/0xdc
[   29.568718]  tty_port_close+0x40/0xbc
[   29.572382]  uart_close+0x34/0x9c
[   29.575697]  ttyport_close+0x50/0x94
[   29.579275]  serdev_device_close+0x40/0x50
[   29.583373]  btnxpuart_close+0x48/0xbc [btnxpuart]
[   29.588179]  hci_dev_close_sync+0x2ec/0x754 [bluetooth]
[   29.593562]  hci_dev_do_close+0x2c/0x70 [bluetooth]
[   29.598587]  hci_power_off+0x20/0x64 [bluetooth]
[   29.603353]  process_one_work+0x138/0x244
[   29.607369]  worker_thread+0x320/0x438
[   29.611119]  kthread+0x114/0x118
[   29.614352]  ret_from_fork+0x10/0x20

 drivers/bluetooth/btnxpuart.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/bluetooth/btnxpuart.c b/drivers/bluetooth/btnxpuart.c
index b7e66b7ac570..9cb7529eef09 100644
--- a/drivers/bluetooth/btnxpuart.c
+++ b/drivers/bluetooth/btnxpuart.c
@@ -1234,6 +1234,9 @@ static int btnxpuart_close(struct hci_dev *hdev)
 
 	ps_wakeup(nxpdev);
 	serdev_device_close(nxpdev->serdev);
+	skb_queue_purge(&nxpdev->txq);
+	kfree_skb(nxpdev->rx_skb);
+	nxpdev->rx_skb = NULL;
 	clear_bit(BTNXPUART_SERDEV_OPEN, &nxpdev->tx_state);
 	return 0;
 }
-- 
2.36.1
Re: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Francesco Dolcini 2 years ago
Hello all,

On Wed, Oct 18, 2023 at 04:55:39PM +0200, Marcel Ziswiler wrote:
> From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> 
> Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while
> atomic. Fix this by properly purging the transmit queue and freeing the
> receive skb.
> 
> Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets")
> 
> Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> ---
> This is the kernel trace this commit fixes:
> [   29.270685] BUG: scheduling while atomic: kworker/u3:0/55/0x00000002

I just hit this bug with 6.7-rc2, I think it would be worth to
apply this fix.


[   70.443965] BUG: scheduling while atomic: kworker/u5:0/65/0x00000002
[   70.450649] Modules linked in: 8021q garp mrp stp llc usb_f_ncm u_ether spidev mwifiex_sdio mwifiex snd_soc_simple_card snd_soc_simple_card_utils crct10dif_ce cfg80211 k3_j72xx_bandgap rti_wdt rtc_ti_k3 btnxpuart bluetooth ecdh_generic ecc snd_soc_davinci_mcasp sa2ul sha256_generic rfkill snd_soc_ti_udma libsha256 snd_soc_ti_edma authenc tidss snd_soc_ti_sdma omap_mailbox drm_dma_helper ti_ads1015 industrialio_triggered_buffer lontium_lt8912b snd_soc_wm8904 kfifo_buf lm75 at24 tps65219_pwrbutton ina2xx m_can_platform rtc_ds1307 tc358768 m_can display_connector spi_omap2_mcspi pwm_tiehrpwm can_dev drm_kms_helper libcomposite fuse drm backlight ipv6
[   70.509384] CPU: 0 PID: 65 Comm: kworker/u5:0 Not tainted 6.7.0-rc2-00147-gf1a09972a45a #1
[   70.517747] Hardware name: Toradex Verdin AM62 WB on Dahlia Board (DT)
[   70.524334] Workqueue: hci0 hci_power_on [bluetooth]
[   70.529870] Call trace:
[   70.532349]  dump_backtrace+0x94/0xec
[   70.536103]  show_stack+0x18/0x24
[   70.539491]  dump_stack_lvl+0x48/0x60
[   70.543210]  dump_stack+0x18/0x24
[   70.546575]  __schedule_bug+0x50/0x68
[   70.550312]  __schedule+0x7c4/0xa64
[   70.553874]  schedule+0x34/0xa0
[   70.557083]  schedule_timeout+0x180/0x25c
[   70.561151]  wait_for_completion_timeout+0x80/0x15c
[   70.566101]  ti_sci_set_device_state+0x164/0x22c
[   70.570790]  ti_sci_cmd_get_device_exclusive+0x18/0x24
[   70.575991]  ti_sci_pd_power_on+0x28/0x48
[   70.580073]  _genpd_power_on+0x94/0x154
[   70.583964]  genpd_power_on.part.0+0xa4/0x174
[   70.588383]  genpd_runtime_resume+0x118/0x294
[   70.592800]  __rpm_callback+0x48/0x140
[   70.596616]  rpm_callback+0x6c/0x78
[   70.600165]  rpm_resume+0x3bc/0x59c
[   70.603714]  __pm_runtime_resume+0x4c/0x90
[   70.607870]  omap8250_set_mctrl+0x2c/0xc0
[   70.611954]  serial8250_set_mctrl.part.0+0x18/0x34
[   70.616801]  serial8250_set_mctrl+0x18/0x28
[   70.621038]  uart_update_mctrl+0x58/0x78
[   70.625024]  uart_dtr_rts+0x108/0x118
[   70.628750]  tty_port_shutdown+0x88/0xd8
[   70.632744]  tty_port_close+0x50/0xac
[   70.636472]  uart_close+0x28/0x80
[   70.639850]  ttyport_close+0x50/0x94
[   70.643500]  serdev_device_close+0x40/0x50
[   70.647664]  btnxpuart_close+0x24/0x84 [btnxpuart]
[   70.652553]  hci_dev_open_sync+0x3f8/0x9dc [bluetooth]
[   70.658144]  hci_dev_do_open+0x28/0x48 [bluetooth]
[   70.663377]  hci_power_on+0x4c/0x2ac [bluetooth]
[   70.668441]  process_scheduled_works+0x16c/0x28c
[   70.673135]  worker_thread+0x16c/0x2e0
[   70.676950]  kthread+0x11c/0x128
[   70.680247]  ret_from_fork+0x10/0x20
Re: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Francesco Dolcini 1 year, 9 months ago
Hi Luiz,

On Fri, Nov 24, 2023 at 09:26:11PM +0100, Francesco Dolcini wrote:
> On Wed, Oct 18, 2023 at 04:55:39PM +0200, Marcel Ziswiler wrote:
> > From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > 
> > Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while
> > atomic. Fix this by properly purging the transmit queue and freeing the
> > receive skb.
> > 
> > Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets")
> > 
> > Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > ---
> > This is the kernel trace this commit fixes:
> > [   29.270685] BUG: scheduling while atomic: kworker/u3:0/55/0x00000002
> 
> I just hit this bug with 6.7-rc2, I think it would be worth to
> apply this fix.

Do you need any change for having this patch (1/2) applied? Do you want this
to be re-sent without the second patch (2/2) from this series that is
maybe more controversial?

Let me know how I can help,

Thanks,
Francesco
Re: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Luiz Augusto von Dentz 1 year, 9 months ago
Hi Francesco,

On Mon, Mar 4, 2024 at 11:52 AM Francesco Dolcini <francesco@dolcini.it> wrote:
>
> Hi Luiz,
>
> On Fri, Nov 24, 2023 at 09:26:11PM +0100, Francesco Dolcini wrote:
> > On Wed, Oct 18, 2023 at 04:55:39PM +0200, Marcel Ziswiler wrote:
> > > From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > >
> > > Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while
> > > atomic. Fix this by properly purging the transmit queue and freeing the
> > > receive skb.
> > >
> > > Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP Bluetooth chipsets")
> > >
> > > Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > > ---
> > > This is the kernel trace this commit fixes:
> > > [   29.270685] BUG: scheduling while atomic: kworker/u3:0/55/0x00000002
> >
> > I just hit this bug with 6.7-rc2, I think it would be worth to
> > apply this fix.
>
> Do you need any change for having this patch (1/2) applied? Do you want this
> to be re-sent without the second patch (2/2) from this series that is
> maybe more controversial?

Yes please just resend it.

> Let me know how I can help,
>
> Thanks,
> Francesco
>


-- 
Luiz Augusto von Dentz
RE: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Neeraj sanjay kale 2 years, 2 months ago
Hi Marcel,

Thank you for your patch.

> From: Marcel Ziswiler <marcel@ziswiler.com>
> Sent: Wednesday, October 18, 2023 8:26 PM
> To: linux-bluetooth@vger.kernel.org
> Cc: Sherry Sun <sherry.sun@nxp.com>; Johan Hedberg
> <johan.hedberg@gmail.com>; Luiz Augusto von Dentz
> <luiz.dentz@gmail.com>; Neeraj sanjay kale <neeraj.sanjaykale@nxp.com>;
> linux-kernel@vger.kernel.org; Marcel Holtmann <marcel@holtmann.org>;
> Marcel Ziswiler <marcel.ziswiler@toradex.com>; Amitkumar Karwar
> <amitkumar.karwar@nxp.com>; Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Subject: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
> 
> From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> 
> Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while atomic.
> Fix this by properly purging the transmit queue and freeing the receive skb.
> 
> Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP
> Bluetooth chipsets")
> 
> Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> ---
> 
>  drivers/bluetooth/btnxpuart.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/bluetooth/btnxpuart.c b/drivers/bluetooth/btnxpuart.c
> index b7e66b7ac570..9cb7529eef09 100644
> --- a/drivers/bluetooth/btnxpuart.c
> +++ b/drivers/bluetooth/btnxpuart.c
> @@ -1234,6 +1234,9 @@ static int btnxpuart_close(struct hci_dev *hdev)
> 
>         ps_wakeup(nxpdev);
>         serdev_device_close(nxpdev->serdev);
> +       skb_queue_purge(&nxpdev->txq);
> +       kfree_skb(nxpdev->rx_skb);
> +       nxpdev->rx_skb = NULL;
>         clear_bit(BTNXPUART_SERDEV_OPEN, &nxpdev->tx_state);
>         return 0;
>  }
This is already done in btnxpuart_flush(), which is called by hci_dev_close_sync(), before it calls btnxpuart_close().
Is btnxpuart_flush() not called during your testing?

Thanks,
Neeraj
Re: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Marcel Ziswiler 2 years, 2 months ago
Hi Neeraj

On Wed, 2023-10-18 at 15:36 +0000, Neeraj sanjay kale wrote:
> Hi Marcel,
> 
> Thank you for your patch.
> 
> > From: Marcel Ziswiler <marcel@ziswiler.com>
> > Sent: Wednesday, October 18, 2023 8:26 PM
> > To: linux-bluetooth@vger.kernel.org
> > Cc: Sherry Sun <sherry.sun@nxp.com>; Johan Hedberg
> > <johan.hedberg@gmail.com>; Luiz Augusto von Dentz
> > <luiz.dentz@gmail.com>; Neeraj sanjay kale <neeraj.sanjaykale@nxp.com>;
> > linux-kernel@vger.kernel.org; Marcel Holtmann <marcel@holtmann.org>;
> > Marcel Ziswiler <marcel.ziswiler@toradex.com>; Amitkumar Karwar
> > <amitkumar.karwar@nxp.com>; Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Subject: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
> > 
> > From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > 
> > Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while atomic.
> > Fix this by properly purging the transmit queue and freeing the receive skb.
> > 
> > Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP
> > Bluetooth chipsets")
> > 
> > Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > ---
> > 
> >  drivers/bluetooth/btnxpuart.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/bluetooth/btnxpuart.c b/drivers/bluetooth/btnxpuart.c
> > index b7e66b7ac570..9cb7529eef09 100644
> > --- a/drivers/bluetooth/btnxpuart.c
> > +++ b/drivers/bluetooth/btnxpuart.c
> > @@ -1234,6 +1234,9 @@ static int btnxpuart_close(struct hci_dev *hdev)
> > 
> >         ps_wakeup(nxpdev);
> >         serdev_device_close(nxpdev->serdev);
> > +       skb_queue_purge(&nxpdev->txq);
> > +       kfree_skb(nxpdev->rx_skb);
> > +       nxpdev->rx_skb = NULL;
> >         clear_bit(BTNXPUART_SERDEV_OPEN, &nxpdev->tx_state);
> >         return 0;
> >  }
> This is already done in btnxpuart_flush(), which is called by hci_dev_close_sync(), before it calls
> btnxpuart_close().

Yes, I was also wondering about that.

> Is btnxpuart_flush() not called during your testing?

Yes, I even added some more tracing to confirm this. However, without my fix (which BTW was inspired by looking
at the former hci_mrvl.c driver) this bug is really occuring. Just keep loading/un-loading the kernel module a
few times any you may hit it.

> Thanks,
> Neeraj

Cheers

Marcel
Re: [PATCH v1 1/2] Bluetooth: btnxpuart: Fix btnxpuart_close
Posted by Neeraj sanjay kale 2 years, 2 months ago
Hi Marcel

> > > From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > >
> > > Unfortunately, btnxpuart_close() may trigger a BUG: scheduling while
> atomic.
> > > Fix this by properly purging the transmit queue and freeing the receive
> skb.
> > >
> > > Fixes: 689ca16e5232 ("Bluetooth: NXP: Add protocol support for NXP
> > > Bluetooth chipsets")
> > >
> > > Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > > ---
> > >
> > >  drivers/bluetooth/btnxpuart.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/bluetooth/btnxpuart.c
> > > b/drivers/bluetooth/btnxpuart.c index b7e66b7ac570..9cb7529eef09
> > > 100644
> > > --- a/drivers/bluetooth/btnxpuart.c
> > > +++ b/drivers/bluetooth/btnxpuart.c
> > > @@ -1234,6 +1234,9 @@ static int btnxpuart_close(struct hci_dev
> > > *hdev)
> > >
> > >         ps_wakeup(nxpdev);
> > >         serdev_device_close(nxpdev->serdev);
> > > +       skb_queue_purge(&nxpdev->txq);
> > > +       kfree_skb(nxpdev->rx_skb);
> > > +       nxpdev->rx_skb = NULL;
> > >         clear_bit(BTNXPUART_SERDEV_OPEN, &nxpdev->tx_state);
> > >         return 0;
> > >  }
> > This is already done in btnxpuart_flush(), which is called by
> > hci_dev_close_sync(), before it calls btnxpuart_close().
> 
> Yes, I was also wondering about that.
> 
> > Is btnxpuart_flush() not called during your testing?
> 
> Yes, I even added some more tracing to confirm this. However, without my
> fix (which BTW was inspired by looking at the former hci_mrvl.c driver) this
> bug is really occuring. Just keep loading/un-loading the kernel module a few
> times any you may hit it.
> 
Our QA team has been running load/unload tests for quite some time now, and
such an issue was never reported.
I am not sure why you do not see the btnxpuart_flush () been called, but I tested this patch
on my setup, where both, btnxpuart_flush() and btnxpuart_close() are called, and I
don’t see any issue due to kfree_skb and txq purge been done twice.

This looks ok to me.

Reviewed-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>