drivers/net/ethernet/renesas/ravb_main.c | 3 +++ 1 file changed, 3 insertions(+)
In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
If timeout occurs, it will start the work. And if we call
ravb_remove without finishing the work, there may be a
use-after-free bug on ndev.
Fix it by finishing the job before cleanup in ravb_remove.
Note that this bug is found by static analysis, it might be
false positive.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
---
v4:
- add information about the bug was found suggested by Yunsheng Lin
v3:
- fix typo in commit message
v2:
- stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
add an empty line to make code clear suggested by Sergey Shtylyov
---
drivers/net/ethernet/renesas/ravb_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 4d6b3b7d6abb..ce2da5101e51 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
struct ravb_private *priv = netdev_priv(ndev);
const struct ravb_hw_info *info = priv->info;
+ netif_carrier_off(ndev);
+ netif_tx_disable(ndev);
+ cancel_work_sync(&priv->work);
/* Stop PTP Clock driver */
if (info->ccc_gac)
ravb_ptp_stop(ndev);
--
2.25.1
On Tue, 25 Jul 2023, Zheng Wang wrote:
> In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> If timeout occurs, it will start the work. And if we call
> ravb_remove without finishing the work, there may be a
> use-after-free bug on ndev.
>
> Fix it by finishing the job before cleanup in ravb_remove.
>
> Note that this bug is found by static analysis, it might be
> false positive.
>
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
> ---
> v4:
> - add information about the bug was found suggested by Yunsheng Lin
> v3:
> - fix typo in commit message
> v2:
> - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> add an empty line to make code clear suggested by Sergey Shtylyov
> ---
> drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> 1 file changed, 3 insertions(+)
Trying my best not to sound like a broken record, but ...
What's the latest with this fix? Is a v5 en route?
--
Lee Jones [李琼斯]
On Tue, 15 Aug 2023, Lee Jones wrote:
> On Tue, 25 Jul 2023, Zheng Wang wrote:
>
> > In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> > If timeout occurs, it will start the work. And if we call
> > ravb_remove without finishing the work, there may be a
> > use-after-free bug on ndev.
> >
> > Fix it by finishing the job before cleanup in ravb_remove.
> >
> > Note that this bug is found by static analysis, it might be
> > false positive.
> >
> > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
> > ---
> > v4:
> > - add information about the bug was found suggested by Yunsheng Lin
> > v3:
> > - fix typo in commit message
> > v2:
> > - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> > add an empty line to make code clear suggested by Sergey Shtylyov
> > ---
> > drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> > 1 file changed, 3 insertions(+)
>
> Trying my best not to sound like a broken record, but ...
>
> What's the latest with this fix? Is a v5 en route?
Any update please Zheng Wang?
--
Lee Jones [李琼斯]
Sorry for my late reply. I'll update another patch later today.
Best regards,
Zheng
Lee Jones <lee@kernel.org> 于2023年8月29日周二 21:46写道:
>
> On Tue, 15 Aug 2023, Lee Jones wrote:
>
> > On Tue, 25 Jul 2023, Zheng Wang wrote:
> >
> > > In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> > > If timeout occurs, it will start the work. And if we call
> > > ravb_remove without finishing the work, there may be a
> > > use-after-free bug on ndev.
> > >
> > > Fix it by finishing the job before cleanup in ravb_remove.
> > >
> > > Note that this bug is found by static analysis, it might be
> > > false positive.
> > >
> > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
> > > ---
> > > v4:
> > > - add information about the bug was found suggested by Yunsheng Lin
> > > v3:
> > > - fix typo in commit message
> > > v2:
> > > - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> > > add an empty line to make code clear suggested by Sergey Shtylyov
> > > ---
> > > drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> >
> > Trying my best not to sound like a broken record, but ...
> >
> > What's the latest with this fix? Is a v5 en route?
>
> Any update please Zheng Wang?
>
> --
> Lee Jones [李琼斯]
Hi everyone,
After reviewing all comments about the patch. I agree with Jakub. But
adding reference on net_device is a big move. All related drivers must
modify the code.
For now, I couldn't think a better idea about the fix. Thanks for your
effort and sorry for my late reply.
Best regards,
Zheng Wang
Zheng Hacker <hackerzheng666@gmail.com> 于2023年8月30日周三 12:30写道:
>
> Sorry for my late reply. I'll update another patch later today.
>
> Best regards,
> Zheng
>
> Lee Jones <lee@kernel.org> 于2023年8月29日周二 21:46写道:
> >
> > On Tue, 15 Aug 2023, Lee Jones wrote:
> >
> > > On Tue, 25 Jul 2023, Zheng Wang wrote:
> > >
> > > > In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> > > > If timeout occurs, it will start the work. And if we call
> > > > ravb_remove without finishing the work, there may be a
> > > > use-after-free bug on ndev.
> > > >
> > > > Fix it by finishing the job before cleanup in ravb_remove.
> > > >
> > > > Note that this bug is found by static analysis, it might be
> > > > false positive.
> > > >
> > > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > > Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
> > > > ---
> > > > v4:
> > > > - add information about the bug was found suggested by Yunsheng Lin
> > > > v3:
> > > > - fix typo in commit message
> > > > v2:
> > > > - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> > > > add an empty line to make code clear suggested by Sergey Shtylyov
> > > > ---
> > > > drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > >
> > > Trying my best not to sound like a broken record, but ...
> > >
> > > What's the latest with this fix? Is a v5 en route?
> >
> > Any update please Zheng Wang?
> >
> > --
> > Lee Jones [李琼斯]
On Tue, 25 Jul 2023 11:00:26 +0800 Zheng Wang wrote: > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index 4d6b3b7d6abb..ce2da5101e51 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > struct ravb_private *priv = netdev_priv(ndev); > const struct ravb_hw_info *info = priv->info; > > + netif_carrier_off(ndev); > + netif_tx_disable(ndev); > + cancel_work_sync(&priv->work); Still racy, the carrier can come back up after canceling the work. But whatever, this is a non-issue in the first place. The fact that ravb_tx_timeout_work doesn't take any locks seems much more suspicious.
On 26.07.2023 05:19, Jakub Kicinski wrote: ... > The fact that ravb_tx_timeout_work doesn't take any locks seems much > more suspicious. Does anybody plan to look into this, too? Best regards Dirk
Hello Behme, > From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM > > On 26.07.2023 05:19, Jakub Kicinski wrote: > ... > > The fact that ravb_tx_timeout_work doesn't take any locks seems much > > more suspicious. > Does anybody plan to look into this, too? I believe my fixed patch [1] resolved this issue too. Let me explain it in detail below. In the thread, Jakub also mentioned [2] like below: --- Simplest fix I can think of is to take a reference on the netdev before scheduling the work, and then check if it's still registered in the work itself. Wrap the timeout work in rtnl_lock() to avoid any races there. --- Sergey suggested to add cancel_work_sync() into the ravb_close () [3]. And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4] like below: ----------------------------------------------------------------------- ravb_remove() calls unregister_netdev(). -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). --> unregiter_netdevice_queue() ---> unregiter_netdevice_many() ----> unregiter_netdevice_many_notify(). -----> dev_close_many() ------> __dev_close_many() -------> ops->ndo_stop() ravb_close() calls phy_stop() -> phy_state_machine() with PHY_HALTED --> phy_link_down() ---> phy_link_change() ----> netif_carrier_off() ----------------------------------------------------------------------- So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(), it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work(). [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89 [2] https://lore.kernel.org/netdev/20230727164820.48c9e685@kernel.org/ [3] https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2-0cf769b48187@omp.ru/ [4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@OSYPR01MB5334.jpnprd01.prod.outlook.com/ Best regards, Yoshihiro Shimoda > Best regards > > Dirk
Hi,
On 12.10.2023 10:39, Yoshihiro Shimoda wrote:
> Hello Behme,
>
>> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM
>>
>> On 26.07.2023 05:19, Jakub Kicinski wrote:
>> ...
>>> The fact that ravb_tx_timeout_work doesn't take any locks seems much
>>> more suspicious.
>> Does anybody plan to look into this, too?
>
> I believe my fixed patch [1] resolved this issue too.
I'm not an expert of this driver nor the network stack, so sorry if I'm
totally wrong here ;) But somehow this answer confuses me. Let me explain:
What you did with [1] is to stop/cancel the workqueue in ravb_close().
That's fine. But that is at driver's close time.
What's about driver's normal runtime? What I understood is that
ravb_tx_timeout_work() might run totally asynchronously to the rest of
the driver. And therefore needs locking/protection/synchronization if it
uses resources of the driver which are used elsewhere in the driver, too.
I think this is exactly what is described with:
> ---
> Simplest fix I can think of is to take a reference on the netdev before
> scheduling the work, and then check if it's still registered in the work
> itself. Wrap the timeout work in rtnl_lock() to avoid any races there.
> ---
So, where is above done? Not at close time, but at normal run time of
the driver?
Best regards
Dirk
> Sergey suggested to add cancel_work_sync() into the ravb_close () [3].
> And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4]
> like below:
> -----------------------------------------------------------------------
> ravb_remove() calls unregister_netdev().
> -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> --> unregiter_netdevice_queue()
> ---> unregiter_netdevice_many()
> ----> unregiter_netdevice_many_notify().
> -----> dev_close_many()
> ------> __dev_close_many()
> -------> ops->ndo_stop()
>
> ravb_close() calls phy_stop()
> -> phy_state_machine() with PHY_HALTED
> --> phy_link_down()
> ---> phy_link_change()
> ----> netif_carrier_off()
> -----------------------------------------------------------------------
>
> So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(),
> it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work().
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89
> [2] https://lore.kernel.org/netdev/20230727164820.48c9e685@kernel.org/
> [3] https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2-0cf769b48187@omp.ru/
> [4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@OSYPR01MB5334.jpnprd01.prod.outlook.com/
>
> Best regards,
> Yoshihiro Shimoda
>
>> Best regards
>>
>> Dirk
--
======================================================================
Dirk Behme Robert Bosch Car Multimedia GmbH
CM/ESO2
Phone: +49 5121 49-3274 Dirk Behme
Fax: +49 711 811 5053274 PO Box 77 77 77
mailto:dirk.behme@de.bosch.com D-31132 Hildesheim - Germany
Bosch Group, Car Multimedia (CM)
Engineering SW Operating Systems 2 (ESO2)
Robert Bosch Car Multimedia GmbH - Ein Unternehmen der Bosch Gruppe
Sitz: Hildesheim
Registergericht: Amtsgericht Hildesheim HRB 201334
Aufsichtsratsvorsitzender: Dr. Dirk Hoheisel
Geschäftsführung: Dr. Steffen Berns;
Dr. Sven Ost, Jörg Pollak, Dr. Walter Schirm
======================================================================
Hi, > From: Behme Dirk (CM/ESO2), Sent: Friday, October 13, 2023 3:05 PM > > Hi, > > On 12.10.2023 10:39, Yoshihiro Shimoda wrote: > > Hello Behme, > > > >> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM > >> > >> On 26.07.2023 05:19, Jakub Kicinski wrote: > >> ... > >>> The fact that ravb_tx_timeout_work doesn't take any locks seems much > >>> more suspicious. > >> Does anybody plan to look into this, too? > > > > I believe my fixed patch [1] resolved this issue too. > > > I'm not an expert of this driver nor the network stack, so sorry if I'm > totally wrong here ;) But somehow this answer confuses me. Let me explain: > > What you did with [1] is to stop/cancel the workqueue in ravb_close(). > That's fine. But that is at driver's close time. > > What's about driver's normal runtime? What I understood is that > ravb_tx_timeout_work() might run totally asynchronously to the rest of > the driver. And therefore needs locking/protection/synchronization if it > uses resources of the driver which are used elsewhere in the driver, too. > > I think this is exactly what is described with: > > > --- > > Simplest fix I can think of is to take a reference on the netdev before > > scheduling the work, and then check if it's still registered in the work > > itself. Wrap the timeout work in rtnl_lock() to avoid any races there. > > --- > > So, where is above done? Not at close time, but at normal run time of > the driver? Thank you very much for your detailed explanation. I understood it. ravb_tx_timeout_work() still has races between ethtool ops for instance. So, I'll make a patch for it by early next week. However, IIUC, using rtnl_lock() in ravb_tx_timeout_work() is possible to cause deadlock from cancel_work_sync() in ravb_close(). So, I'll use rtnl_trylock() instead. Best regards, Yoshihiro Shimoda > Best regards > > Dirk > > > Sergey suggested to add cancel_work_sync() into the ravb_close () [3]. > > And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4] > > like below: > > ----------------------------------------------------------------------- > > ravb_remove() calls unregister_netdev(). > > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). > > --> unregiter_netdevice_queue() > > ---> unregiter_netdevice_many() > > ----> unregiter_netdevice_many_notify(). > > -----> dev_close_many() > > ------> __dev_close_many() > > -------> ops->ndo_stop() > > > > ravb_close() calls phy_stop() > > -> phy_state_machine() with PHY_HALTED > > --> phy_link_down() > > ---> phy_link_change() > > ----> netif_carrier_off() > > ----------------------------------------------------------------------- > > > > So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(), > > it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work(). > > > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git%25 > 2Fnetdev%2Fnet.git%2Fcommit%2F%3Fid%3D3971442870713de527684398416970cf025b4f89&data=05%7C01%7Cyoshihiro.shimoda.uh%4 > 0renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUn > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HkA8f5a > gawjXMvAGkaE6tELaSpjpbIn7M3mU5xbDTD0%3D&reserved=0 > > [2] > https://lore.kernel.org/netdev/20230727164820.48c9e685 > %40kernel.org%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d82571da19 > 47e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTi > I6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cGvnA8WqxM%2FUDa%2FNS2OBztr1IWgjCX4IzBYXe1LGkZU%3D&reserved=0 > > [3] > https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2 > -0cf769b48187%40omp.ru%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d > 82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM > zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OWwBKy%2Fdckgo3clPPfn2hxE4H6ToyqdcbhPhGoqoo30%3D&reserved=0 > > [4] > https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3 > BA41A6752CD8F9A%40OSYPR01MB5334.jpnprd01.prod.outlook.com%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466 > e046b20b548b264f808dbcbb255f6%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3 > d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jfypf10jiUfTqWUAukjnPzIQp > urx7m0ETF5N2Toq8wE%3D&reserved=0 > > > > Best regards, > > Yoshihiro Shimoda > > > >> Best regards > >> > >> Dirk > > -- > ====================================================================== > Dirk Behme Robert Bosch Car Multimedia GmbH > CM/ESO2 > Phone: +49 5121 49-3274 Dirk Behme > Fax: +49 711 811 5053274 PO Box 77 77 77 > mailto:dirk.behme@de.bosch.com D-31132 Hildesheim - Germany > > Bosch Group, Car Multimedia (CM) > Engineering SW Operating Systems 2 (ESO2) > > Robert Bosch Car Multimedia GmbH - Ein Unternehmen der Bosch Gruppe > Sitz: Hildesheim > Registergericht: Amtsgericht Hildesheim HRB 201334 > Aufsichtsratsvorsitzender: Dr. Dirk Hoheisel > Geschäftsführung: Dr. Steffen Berns; > Dr. Sven Ost, Jörg Pollak, Dr. Walter Schirm > ======================================================================
On Tue, 2023-07-25 at 20:19 -0700, Jakub Kicinski wrote: > On Tue, 25 Jul 2023 11:00:26 +0800 Zheng Wang wrote: > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > > index 4d6b3b7d6abb..ce2da5101e51 100644 > > --- a/drivers/net/ethernet/renesas/ravb_main.c > > +++ b/drivers/net/ethernet/renesas/ravb_main.c > > @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > > struct ravb_private *priv = netdev_priv(ndev); > > const struct ravb_hw_info *info = priv->info; > > > > + netif_carrier_off(ndev); > > + netif_tx_disable(ndev); > > + cancel_work_sync(&priv->work); > > Still racy, the carrier can come back up after canceling the work. I must admit I don't see how/when this driver sets the carrier on ?!? > But whatever, this is a non-issue in the first place. Do you mean the UaF can't happen? I think that is real. > The fact that ravb_tx_timeout_work doesn't take any locks seems much > more suspicious. Indeed! But that should be a different patch, right? Waiting a little more for feedback from renesas. /P
Hello!
On 7/27/23 11:21 AM, Paolo Abeni wrote:
[...]
>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 4d6b3b7d6abb..ce2da5101e51 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
>>> struct ravb_private *priv = netdev_priv(ndev);
>>> const struct ravb_hw_info *info = priv->info;
>>>
>>> + netif_carrier_off(ndev);
>>> + netif_tx_disable(ndev);
>>> + cancel_work_sync(&priv->work);
>>
>> Still racy, the carrier can come back up after canceling the work.
>
> I must admit I don't see how/when this driver sets the carrier on ?!?
The phylib code does it for this MAC driver, see the call tree of
phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/...
>> But whatever, this is a non-issue in the first place.
>
> Do you mean the UaF can't happen? I think that is real.
Looks possible to me, at least now... and anyway, shouldn't we clean up
after ourselves if we call schedule_work()?However my current impression is
that cancel_work_sync() should be called from ravb_close(), after calling
phy_{stop|disconnect}()...
>> The fact that ravb_tx_timeout_work doesn't take any locks seems much
>> more suspicious.
>
> Indeed! But that should be a different patch, right?
Yes.
> Waiting a little more for feedback from renesas.
Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
driver patches, so I took that task upon myself. I also happen to be a nominal
author of this driver... :-)
> /P
MBR, Sergey
On Thu, Jul 27, 2023 at 09:48:41PM +0300, Sergey Shtylyov wrote:
> Hello!
>
> On 7/27/23 11:21 AM, Paolo Abeni wrote:
> [...]
> >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>> index 4d6b3b7d6abb..ce2da5101e51 100644
> >>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> >>> struct ravb_private *priv = netdev_priv(ndev);
> >>> const struct ravb_hw_info *info = priv->info;
> >>>
> >>> + netif_carrier_off(ndev);
> >>> + netif_tx_disable(ndev);
> >>> + cancel_work_sync(&priv->work);
> >>
> >> Still racy, the carrier can come back up after canceling the work.
> >
> > I must admit I don't see how/when this driver sets the carrier on ?!?
>
> The phylib code does it for this MAC driver, see the call tree of
> phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/...
>
> >> But whatever, this is a non-issue in the first place.
> >
> > Do you mean the UaF can't happen? I think that is real.
>
> Looks possible to me, at least now... and anyway, shouldn't we clean up
> after ourselves if we call schedule_work()?However my current impression is
> that cancel_work_sync() should be called from ravb_close(), after calling
> phy_{stop|disconnect}()...
>
> >> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >> more suspicious.
> >
> > Indeed! But that should be a different patch, right?
>
> Yes.
>
> > Waiting a little more for feedback from renesas.
>
> Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
> driver patches, so I took that task upon myself. I also happen to be a nominal
> author of this driver... :-)
FWIIW, that matches my recollection.
Although it may be out of date by now.
On Thu, 27 Jul 2023 21:48:41 +0300 Sergey Shtylyov wrote:
> >> Still racy, the carrier can come back up after canceling the work.
> >
> > I must admit I don't see how/when this driver sets the carrier on ?!?
>
> The phylib code does it for this MAC driver, see the call tree of
> phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/...
>
> >> But whatever, this is a non-issue in the first place.
> >
> > Do you mean the UaF can't happen? I think that is real.
>
> Looks possible to me, at least now... and anyway, shouldn't we clean up
> after ourselves if we call schedule_work()?However my current impression is
> that cancel_work_sync() should be called from ravb_close(), after calling
> phy_{stop|disconnect}()...
>
> >> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >> more suspicious.
> >
> > Indeed! But that should be a different patch, right?
>
> Yes.
>
> > Waiting a little more for feedback from renesas.
>
> Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
> driver patches, so I took that task upon myself. I also happen to be a nominal
> author of this driver... :-)
Simplest fix I can think of is to take a reference on the netdev before
scheduling the work, and then check if it's still registered in the work
itself. Wrap the timeout work in rtnl_lock() to avoid any races there.
Hello Sergey!
> From: Sergey Shtylyov, Sent: Friday, July 28, 2023 3:49 AM
>
> Hello!
>
> On 7/27/23 11:21 AM, Paolo Abeni wrote:
> [...]
> >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>> index 4d6b3b7d6abb..ce2da5101e51 100644
> >>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> >>> struct ravb_private *priv = netdev_priv(ndev);
> >>> const struct ravb_hw_info *info = priv->info;
> >>>
> >>> + netif_carrier_off(ndev);
> >>> + netif_tx_disable(ndev);
> >>> + cancel_work_sync(&priv->work);
> >>
> >> Still racy, the carrier can come back up after canceling the work.
> >
> > I must admit I don't see how/when this driver sets the carrier on ?!?
>
> The phylib code does it for this MAC driver, see the call tree of
> phy_link_change(), on e.g.
> https://elixir.bootlin.com/linux/v6.5-rc3/source
>
> >> But whatever, this is a non-issue in the first place.
> >
> > Do you mean the UaF can't happen? I think that is real.
>
> Looks possible to me, at least now... and anyway, shouldn't we clean up
> after ourselves if we call schedule_work()?However my current impression is
> that cancel_work_sync() should be called from ravb_close(), after calling
> phy_{stop|disconnect}()...
I also think so.
ravb_remove() calls unregister_netdev().
-> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
--> unregiter_netdevice_queue()
---> unregiter_netdevice_many()
----> unregiter_netdevice_many_notify().
-----> dev_close_many()
------> __dev_close_many()
-------> ops->ndo_stop()
ravb_close() calls phy_stop()
-> phy_state_machine() with PHY_HALTED
--> phy_link_down()
---> phy_link_change()
----> netif_carrier_off()
The patch will be the following:
---
drivers/net/ethernet/renesas/ravb_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 7df9f9f8e134..e452d90de7c2 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev)
of_phy_deregister_fixed_link(np);
}
+ cancel_work_sync(&priv->work);
+
if (info->multi_irqs) {
free_irq(priv->tx_irqs[RAVB_NC], ndev);
free_irq(priv->rx_irqs[RAVB_NC], ndev);
---
If this patch is acceptable, I'll submit it. But, what do you think?
Best regards,
Yoshihiro Shimoda
> >> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >> more suspicious.
> >
> > Indeed! But that should be a different patch, right?
>
> Yes.
>
> > Waiting a little more for feedback from renesas.
>
> Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
> driver patches, so I took that task upon myself. I also happen to be a nominal
> author of this driver... :-)
>
> > /P
>
> MBR, Sergey
Hello!
Concerning the subject: I doubt that UAF acronym is known to
everybody (e.g. it wasn't known to me), I think we should be able
to afford spelling out use-after-free there...
On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote:
[...]
>>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> index 4d6b3b7d6abb..ce2da5101e51 100644
>>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
>>>>> struct ravb_private *priv = netdev_priv(ndev);
>>>>> const struct ravb_hw_info *info = priv->info;
>>>>>
>>>>> + netif_carrier_off(ndev);
>>>>> + netif_tx_disable(ndev);
>>>>> + cancel_work_sync(&priv->work);
>>>>
>>>> Still racy, the carrier can come back up after canceling the work.
>>>
>>> I must admit I don't see how/when this driver sets the carrier on ?!?
>>
>> The phylib code does it for this MAC driver, see the call tree of
>> phy_link_change(), on e.g.
>> https://elixir.bootlin.com/linux/v6.5-rc3/source
>>
>>>> But whatever, this is a non-issue in the first place.
>>>
>>> Do you mean the UaF can't happen? I think that is real.
>>
>> Looks possible to me, at least now... and anyway, shouldn't we clean up
>> after ourselves if we call schedule_work()?However my current impression is
>> that cancel_work_sync() should be called from ravb_close(), after calling
>> phy_{stop|disconnect}()...
>
> I also think so.
>
> ravb_remove() calls unregister_netdev().
> -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> --> unregiter_netdevice_queue()
> ---> unregiter_netdevice_many()
> ----> unregiter_netdevice_many_notify().
> -----> dev_close_many()
> ------> __dev_close_many()
> -------> ops->ndo_stop()
>
> ravb_close() calls phy_stop()
> -> phy_state_machine() with PHY_HALTED
> --> phy_link_down()
> ---> phy_link_change()
> ----> netif_carrier_off()
Thanks for sharing the call chain, I've followed it once again... :-)
> The patch will be the following:
> ---
> drivers/net/ethernet/renesas/ravb_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 7df9f9f8e134..e452d90de7c2 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev)
> of_phy_deregister_fixed_link(np);
> }
>
> + cancel_work_sync(&priv->work);
> +
> if (info->multi_irqs) {
> free_irq(priv->tx_irqs[RAVB_NC], ndev);
> free_irq(priv->rx_irqs[RAVB_NC], ndev);
> ---
>
> If this patch is acceptable, I'll submit it. But, what do you think?
I think it should do the job. And I suspect you can even test it... :-)
> Best regards,
> Yoshihiro Shimoda
[...]
MBR, Sergey
Hello Sergey,
> From: Sergey Shtylyov, Sent: Wednesday, October 4, 2023 4:39 AM
>
> Hello!
>
> Concerning the subject: I doubt that UAF acronym is known to
> everybody (e.g. it wasn't known to me), I think we should be able
> to afford spelling out use-after-free there...
I got it. I'll change the subject.
> On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote:
> [...]
>
> >>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> index 4d6b3b7d6abb..ce2da5101e51 100644
> >>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> >>>>> struct ravb_private *priv = netdev_priv(ndev);
> >>>>> const struct ravb_hw_info *info = priv->info;
> >>>>>
> >>>>> + netif_carrier_off(ndev);
> >>>>> + netif_tx_disable(ndev);
> >>>>> + cancel_work_sync(&priv->work);
> >>>>
> >>>> Still racy, the carrier can come back up after canceling the work.
> >>>
> >>> I must admit I don't see how/when this driver sets the carrier on ?!?
> >>
> >> The phylib code does it for this MAC driver, see the call tree of
> >> phy_link_change(), on e.g.
> >>
<snip URL>
> >>
> >>>> But whatever, this is a non-issue in the first place.
> >>>
> >>> Do you mean the UaF can't happen? I think that is real.
> >>
> >> Looks possible to me, at least now... and anyway, shouldn't we clean up
> >> after ourselves if we call schedule_work()?However my current impression is
> >> that cancel_work_sync() should be called from ravb_close(), after calling
> >> phy_{stop|disconnect}()...
> >
> > I also think so.
> >
> > ravb_remove() calls unregister_netdev().
> > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> > --> unregiter_netdevice_queue()
> > ---> unregiter_netdevice_many()
> > ----> unregiter_netdevice_many_notify().
> > -----> dev_close_many()
> > ------> __dev_close_many()
> > -------> ops->ndo_stop()
> >
> > ravb_close() calls phy_stop()
> > -> phy_state_machine() with PHY_HALTED
> > --> phy_link_down()
> > ---> phy_link_change()
> > ----> netif_carrier_off()
>
> Thanks for sharing the call chain, I've followed it once again... :-)
Thank you :)
> > The patch will be the following:
> > ---
> > drivers/net/ethernet/renesas/ravb_main.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index 7df9f9f8e134..e452d90de7c2 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev)
> > of_phy_deregister_fixed_link(np);
> > }
> >
> > + cancel_work_sync(&priv->work);
> > +
> > if (info->multi_irqs) {
> > free_irq(priv->tx_irqs[RAVB_NC], ndev);
> > free_irq(priv->rx_irqs[RAVB_NC], ndev);
> > ---
> >
> > If this patch is acceptable, I'll submit it. But, what do you think?
>
> I think it should do the job.
Thank you for your comment! I'll make such a patch.
> And I suspect you can even test it... :-)
IIUC, causing tx timeout is difficult. But, I guess
we can add a fault injection code somehow.
Best regards,
Yoshihiro Shimoda
> > Best regards,
> > Yoshihiro Shimoda
>
> [...]
>
> MBR, Sergey
Hello! On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote: Sorry, I got ill that same day and still have subfebrile temperature, and I forgot about your mail. I'll try replying to it on this weekend... [...] MBR, Sergey
Hello Sergey, > From: Sergey Shtylyov, Sent: Saturday, September 30, 2023 5:23 AM > > Hello! > > On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote: > > Sorry, I got ill that same day and still have subfebrile temperature, > and I forgot about your mail. I'll try replying to it on this weekend... Thank you for your reply! I understood it. Please take care of yourself. I hope you will get well soon. Best regards, Yoshihiro Shimoda > [...] > > MBR, Sergey
© 2016 - 2026 Red Hat, Inc.