[PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm

Kuen-Han Tsai posted 2 patches 1 month, 3 weeks ago
drivers/usb/gadget/function/f_ncm.c            | 29 +++++++++++---------------
drivers/usb/gadget/function/u_ether_configfs.h | 11 +---------
drivers/usb/gadget/function/u_ncm.h            |  1 -
drivers/usb/gadget/legacy/ncm.c                | 13 +++++++++---
4 files changed, 23 insertions(+), 31 deletions(-)
[PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by Kuen-Han Tsai 1 month, 3 weeks ago
Commit 56a512a9b410 ("usb: gadget: f_ncm: align net_device lifecycle
with bind/unbind") addressed a lifetime mismatch where the network
interface outlived the parent gadget. However, this introduced two
regressions:

1. A NULL pointer dereference in the legacy g_ncm driver. The legacy
   driver attempts to access the net_device during its binding process
   before the NCM function driver is fully initialized.

2. A "sleeping function called from atomic context" error in f_ncm.
   The current implementation holds a mutex which might sleep within
   an atomic context.

To resolve these, store the configuration parameters (qmult, host_addr,
dev_addr) in opts_net until the network device is ready for g_ncm.
Additionally, remove the net_device pointer from the f_ncm_opts
structure. This eliminates the race condition with configfs and allows
dropping the mutex, preventing the atomic sleep issue.

Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
---
Changes in v2:
- Remove the RFC tag.
- Fix NPE in gncm reported by the kernel test bot.
- Fix a "sleeping function called from atomic context" error.
- Link to v1: https://lore.kernel.org/r/20260214-legacy-ncm-v1-1-139c5bcc6636@google.com

---
Kuen-Han Tsai (2):
      usb: legacy: ncm: Fix NPE in gncm_bind
      usb: gadget: f_ncm: Fix atomic context locking issue

 drivers/usb/gadget/function/f_ncm.c            | 29 +++++++++++---------------
 drivers/usb/gadget/function/u_ether_configfs.h | 11 +---------
 drivers/usb/gadget/function/u_ncm.h            |  1 -
 drivers/usb/gadget/legacy/ncm.c                | 13 +++++++++---
 4 files changed, 23 insertions(+), 31 deletions(-)
---
base-commit: da87d45b195148d670ab995367d52aa9e8a9a1fa
change-id: 20260214-legacy-ncm-8c001295b343

Best regards,
-- 
Kuen-Han Tsai <khtsai@google.com>
Re: [PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by David Heidelberg 1 month, 2 weeks ago
Hello Kuen-Han,

sadly this series is not enough to fully fix the usb issue encountered 
on qcom sdm845 platform (namely Pixel 3, OnePlus 6/6T etc.).

I didn't debugged deeply, but without these patches interface (indicated 
by NM icon) goes on/off/on/off indefinitely. With your patches it seems 
stable, but I'm not getting the DHCP address from the phone, which isn't 
issue at all when I revert the 56a512a9b4107079f68701e7d55da8507eb963d9
("usb: gadget: f_ncm: align net_device lifecycle with bind/unbind").

I think reverting the original patch would make more sense and then 
follow up with new one.

Feel free to add me into CC and I'll happily test on the sdm845 mobile 
devices for you.

David

On 21/02/2026 15:48, Kuen-Han Tsai wrote:
> Commit 56a512a9b410 ("usb: gadget: f_ncm: align net_device lifecycle
> with bind/unbind") addressed a lifetime mismatch where the network
> interface outlived the parent gadget. However, this introduced two
> regressions:
> 
> 1. A NULL pointer dereference in the legacy g_ncm driver. The legacy
>     driver attempts to access the net_device during its binding process
>     before the NCM function driver is fully initialized.
> 
> 2. A "sleeping function called from atomic context" error in f_ncm.
>     The current implementation holds a mutex which might sleep within
>     an atomic context.
> 
> To resolve these, store the configuration parameters (qmult, host_addr,
> dev_addr) in opts_net until the network device is ready for g_ncm.
> Additionally, remove the net_device pointer from the f_ncm_opts
> structure. This eliminates the race condition with configfs and allows
> dropping the mutex, preventing the atomic sleep issue.
> 
> Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
> ---
> Changes in v2:
> - Remove the RFC tag.
> - Fix NPE in gncm reported by the kernel test bot.
> - Fix a "sleeping function called from atomic context" error.
> - Link to v1: https://lore.kernel.org/r/20260214-legacy-ncm-v1-1-139c5bcc6636@google.com
> 
> ---
> Kuen-Han Tsai (2):
>        usb: legacy: ncm: Fix NPE in gncm_bind
>        usb: gadget: f_ncm: Fix atomic context locking issue
> 
>   drivers/usb/gadget/function/f_ncm.c            | 29 +++++++++++---------------
>   drivers/usb/gadget/function/u_ether_configfs.h | 11 +---------
>   drivers/usb/gadget/function/u_ncm.h            |  1 -
>   drivers/usb/gadget/legacy/ncm.c                | 13 +++++++++---
>   4 files changed, 23 insertions(+), 31 deletions(-)
> ---
> base-commit: da87d45b195148d670ab995367d52aa9e8a9a1fa
> change-id: 20260214-legacy-ncm-8c001295b343
> 
> Best regards,

-- 
David Heidelberg
Re: [PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by Kuen-Han Tsai 1 month, 2 weeks ago
Hi David, Greg,

Thanks for testing. I'm sorry the patches are causing trouble on your setup.

On Sun, Mar 1, 2026 at 5:03 AM David Heidelberg <david@ixit.cz> wrote:
>
> Hello Kuen-Han,
>
> sadly this series is not enough to fully fix the usb issue encountered
> on qcom sdm845 platform (namely Pixel 3, OnePlus 6/6T etc.).
>
> I didn't debugged deeply, but without these patches interface (indicated
> by NM icon) goes on/off/on/off indefinitely. With your patches it seems
> stable, but I'm not getting the DHCP address from the phone, which isn't
> issue at all when I revert the 56a512a9b4107079f68701e7d55da8507eb963d9
> ("usb: gadget: f_ncm: align net_device lifecycle with bind/unbind").
>
> I think reverting the original patch would make more sense and then
> follow up with new one.

The net_device lifecycle change puts us in a dilemma:

1. If we revert the original patch entirely, the NULL pointer
dereference and dangling sysfs references come back on Android.

2. If we only clear the parent gadget pointer during unbind without
unregistering the net_device, the sysfs entries still dangle.

3. If we clear the pointer and unregister the net_device without
freeing it, the device cannot be re-registered unless it is
uninitialized.

4. If we fully unregister and recreate the net_device on each bind, it
breaks DHCP on your setup.

Greg, do you have any thoughts on the best way to untangle this? I am
fully willing to submit a revert for this series to restore the
expected behavior while we figure out a proper architectural fix.

>
> Feel free to add me into CC and I'll happily test on the sdm845 mobile
> devices for you.
>
> David

David, could you share exactly what OS you are using (e.g.,
postmarketOS with an sdm845/6.18-dev tree)? Also, could you provide
some instructions on how to build the code and reproduce this problem
on a Pixel 3? If you have the time, it would be incredibly helpful if
you could dive into this a bit deeper on your device to see exactly
how the DHCP daemon is failing.

Regards,
Kuen-Han


>
> On 21/02/2026 15:48, Kuen-Han Tsai wrote:
> > Commit 56a512a9b410 ("usb: gadget: f_ncm: align net_device lifecycle
> > with bind/unbind") addressed a lifetime mismatch where the network
> > interface outlived the parent gadget. However, this introduced two
> > regressions:
> >
> > 1. A NULL pointer dereference in the legacy g_ncm driver. The legacy
> >     driver attempts to access the net_device during its binding process
> >     before the NCM function driver is fully initialized.
> >
> > 2. A "sleeping function called from atomic context" error in f_ncm.
> >     The current implementation holds a mutex which might sleep within
> >     an atomic context.
> >
> > To resolve these, store the configuration parameters (qmult, host_addr,
> > dev_addr) in opts_net until the network device is ready for g_ncm.
> > Additionally, remove the net_device pointer from the f_ncm_opts
> > structure. This eliminates the race condition with configfs and allows
> > dropping the mutex, preventing the atomic sleep issue.
> >
> > Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
> > ---
> > Changes in v2:
> > - Remove the RFC tag.
> > - Fix NPE in gncm reported by the kernel test bot.
> > - Fix a "sleeping function called from atomic context" error.
> > - Link to v1: https://lore.kernel.org/r/20260214-legacy-ncm-v1-1-139c5bcc6636@google.com
> >
> > ---
> > Kuen-Han Tsai (2):
> >        usb: legacy: ncm: Fix NPE in gncm_bind
> >        usb: gadget: f_ncm: Fix atomic context locking issue
> >
> >   drivers/usb/gadget/function/f_ncm.c            | 29 +++++++++++---------------
> >   drivers/usb/gadget/function/u_ether_configfs.h | 11 +---------
> >   drivers/usb/gadget/function/u_ncm.h            |  1 -
> >   drivers/usb/gadget/legacy/ncm.c                | 13 +++++++++---
> >   4 files changed, 23 insertions(+), 31 deletions(-)
> > ---
> > base-commit: da87d45b195148d670ab995367d52aa9e8a9a1fa
> > change-id: 20260214-legacy-ncm-8c001295b343
> >
> > Best regards,
>
> --
> David Heidelberg
>
Re: [PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by David Heidelberg 1 month, 2 weeks ago
On 02/03/2026 11:15, Kuen-Han Tsai wrote:
[...]>
> David, could you share exactly what OS you are using (e.g.,
> postmarketOS with an sdm845/6.18-dev tree)? Also, could you provide
> some instructions on how to build the code and reproduce this problem
> on a Pixel 3? If you have the time, it would be incredibly helpful if
> you could dive into this a bit deeper on your device to see exactly
> how the DHCP daemon is failing.

Hello Kuen-Han,

it's pmOS initrd, but generally I got reported same behaviour on Mobian (Mobile 
Debian) too.

The Pixel 3 support was merged, so it can be reproduced with:
1. 7.0-rc1 tag
2. -next tree (latest tested is next-20260227)
3. our sdm845-next tree [1] (some WIP patches, working touchscreen on Pixel 3, 
etc.), the tree currently contains the reverts

I can provide log with the patch [2], without the patch [3] (well, it's more 
like nothing is in the log)

I'm very lightly familiar with usb subsystem, so if you give me hints what to 
look for (or what to debug), I'll try find a moment to check to move this forward.

Thank you for working on improving usb gadgets!
David

[1] https://codeberg.org/sdm845/linux/
[2] https://paste.sr.ht/~okias/35982d7e284ee0f767e57923ced591beb4d3b238#L589
[3] https://paste.sr.ht/~okias/4e9172a34e4093445536b51e935dbd229edad7b2#L613


> 
> Regards,
> Kuen-Han
> 
> 
>>
>> On 21/02/2026 15:48, Kuen-Han Tsai wrote:
>>> Commit 56a512a9b410 ("usb: gadget: f_ncm: align net_device lifecycle
>>> with bind/unbind") addressed a lifetime mismatch where the network
>>> interface outlived the parent gadget. However, this introduced two
>>> regressions:
>>>
>>> 1. A NULL pointer dereference in the legacy g_ncm driver. The legacy
>>>      driver attempts to access the net_device during its binding process
>>>      before the NCM function driver is fully initialized.
>>>
>>> 2. A "sleeping function called from atomic context" error in f_ncm.
>>>      The current implementation holds a mutex which might sleep within
>>>      an atomic context.
>>>
>>> To resolve these, store the configuration parameters (qmult, host_addr,
>>> dev_addr) in opts_net until the network device is ready for g_ncm.
>>> Additionally, remove the net_device pointer from the f_ncm_opts
>>> structure. This eliminates the race condition with configfs and allows
>>> dropping the mutex, preventing the atomic sleep issue.
>>>
>>> Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
>>> ---
>>> Changes in v2:
>>> - Remove the RFC tag.
>>> - Fix NPE in gncm reported by the kernel test bot.
>>> - Fix a "sleeping function called from atomic context" error.
>>> - Link to v1: https://lore.kernel.org/r/20260214-legacy-ncm-v1-1-139c5bcc6636@google.com
>>>
>>> ---
>>> Kuen-Han Tsai (2):
>>>         usb: legacy: ncm: Fix NPE in gncm_bind
>>>         usb: gadget: f_ncm: Fix atomic context locking issue
>>>
>>>    drivers/usb/gadget/function/f_ncm.c            | 29 +++++++++++---------------
>>>    drivers/usb/gadget/function/u_ether_configfs.h | 11 +---------
>>>    drivers/usb/gadget/function/u_ncm.h            |  1 -
>>>    drivers/usb/gadget/legacy/ncm.c                | 13 +++++++++---
>>>    4 files changed, 23 insertions(+), 31 deletions(-)
>>> ---
>>> base-commit: da87d45b195148d670ab995367d52aa9e8a9a1fa
>>> change-id: 20260214-legacy-ncm-8c001295b343
>>>
>>> Best regards,
>>
>> --
>> David Heidelberg
>>

-- 
David Heidelberg
Re: [PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by David Heidelberg 1 month, 2 weeks ago
On 02/03/2026 16:41, David Heidelberg wrote:
> On 02/03/2026 11:15, Kuen-Han Tsai wrote:
> [...]>
>> David, could you share exactly what OS you are using (e.g.,
>> postmarketOS with an sdm845/6.18-dev tree)? Also, could you provide
>> some instructions on how to build the code and reproduce this problem
>> on a Pixel 3? If you have the time, it would be incredibly helpful if
>> you could dive into this a bit deeper on your device to see exactly
>> how the DHCP daemon is failing.
> 
> Hello Kuen-Han,
> 
> it's pmOS initrd, but generally I got reported same behaviour on Mobian (Mobile 
> Debian) too.
> 
> The Pixel 3 support was merged, so it can be reproduced with:
> 1. 7.0-rc1 tag
> 2. -next tree (latest tested is next-20260227)
> 3. our sdm845-next tree [1] (some WIP patches, working touchscreen on Pixel 3, 
> etc.), the tree currently contains the reverts

I forgot to mention, the build is just (considering option 3., which includes 
sdm845.config fragment):

make defconfig sdm845.config
make
mkbootimg (with params seen below)

deviceinfo_flash_offset_base="0x00000000"
deviceinfo_flash_offset_kernel="0x00008000"
deviceinfo_flash_offset_ramdisk="0x01000000"
deviceinfo_flash_offset_second="0x00000000"
deviceinfo_flash_offset_tags="0x00000100"

You can use Luca's [1] scripts for the processing kernel to be suitable for 
fastboot (appends initrd to boot.img) and uploading kernel modules to pmOS rootfs.

Options 1. and 2. won't work directly with fastboot, as chainloaded u-boot is 
needed due to fastboot requiring some ancient kernel offset to be in place (we 
keep patch to workaround it in sdm845-next).

David

[1] https://github.com/z3ntu/linux-mainline-scripts/

> 
> I can provide log with the patch [2], without the patch [3] (well, it's more 
> like nothing is in the log)
> 
> I'm very lightly familiar with usb subsystem, so if you give me hints what to 
> look for (or what to debug), I'll try find a moment to check to move this forward.
> 
> Thank you for working on improving usb gadgets!
> David
> 
> [1] https://codeberg.org/sdm845/linux/
> [2] https://paste.sr.ht/~okias/35982d7e284ee0f767e57923ced591beb4d3b238#L589
> [3] https://paste.sr.ht/~okias/4e9172a34e4093445536b51e935dbd229edad7b2#L613

[...]
Re: [PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by Kuen-Han Tsai 1 month, 2 weeks ago
On Tue, Mar 3, 2026 at 1:22 AM David Heidelberg <david@ixit.cz> wrote:
>
> On 02/03/2026 16:41, David Heidelberg wrote:
> > On 02/03/2026 11:15, Kuen-Han Tsai wrote:
> > [...]>
> >> David, could you share exactly what OS you are using (e.g.,
> >> postmarketOS with an sdm845/6.18-dev tree)? Also, could you provide
> >> some instructions on how to build the code and reproduce this problem
> >> on a Pixel 3? If you have the time, it would be incredibly helpful if
> >> you could dive into this a bit deeper on your device to see exactly
> >> how the DHCP daemon is failing.
> >
> > Hello Kuen-Han,
> >
> > it's pmOS initrd, but generally I got reported same behaviour on Mobian (Mobile
> > Debian) too.
> >
> > The Pixel 3 support was merged, so it can be reproduced with:
> > 1. 7.0-rc1 tag
> > 2. -next tree (latest tested is next-20260227)
> > 3. our sdm845-next tree [1] (some WIP patches, working touchscreen on Pixel 3,
> > etc.), the tree currently contains the reverts
>
> I forgot to mention, the build is just (considering option 3., which includes
> sdm845.config fragment):
>
> make defconfig sdm845.config
> make
> mkbootimg (with params seen below)
>
> deviceinfo_flash_offset_base="0x00000000"
> deviceinfo_flash_offset_kernel="0x00008000"
> deviceinfo_flash_offset_ramdisk="0x01000000"
> deviceinfo_flash_offset_second="0x00000000"
> deviceinfo_flash_offset_tags="0x00000100"
>
> You can use Luca's [1] scripts for the processing kernel to be suitable for
> fastboot (appends initrd to boot.img) and uploading kernel modules to pmOS rootfs.
>
> Options 1. and 2. won't work directly with fastboot, as chainloaded u-boot is
> needed due to fastboot requiring some ancient kernel offset to be in place (we
> keep patch to workaround it in sdm845-next).
>
> David
>
> [1] https://github.com/z3ntu/linux-mainline-scripts/
>
> >
> > I can provide log with the patch [2], without the patch [3] (well, it's more
> > like nothing is in the log)
> >
> > I'm very lightly familiar with usb subsystem, so if you give me hints what to
> > look for (or what to debug), I'll try find a moment to check to move this forward.
> >
> > Thank you for working on improving usb gadgets!
> > David
> >
> > [1] https://codeberg.org/sdm845/linux/
> > [2] https://paste.sr.ht/~okias/35982d7e284ee0f767e57923ced591beb4d3b238#L589
> > [3] https://paste.sr.ht/~okias/4e9172a34e4093445536b51e935dbd229edad7b2#L613
>
> [...]

Hi David,

Thanks for providing the detailed reproduction steps and logs; they
were incredibly helpful. I was able to reproduce the issue locally.

Looking at the failure logs, the regression's cause is the kernel
returning the name pattern "usb%d" via configfs instead of the
interface name "usb0". The pmOS DHCP daemon [1] relies on this string
to bind to the interface. I confirmed that providing the actual
interface name resolves the DHCP server startup failure.

However, investigating a fix revealed a deeper architectural flaw in
my patch. By deferring the net_device allocation to the bind() phase,
a single function instance will spawn multiple network devices if it
is symlinked to multiple USB configurations. Since configfs only
exposes a single ifname attribute per instance, it is fundamentally
impossible to accurately report the interface name when multiple
underlying network devices exist for that single instance.

All configurations tied to the same function instance are meant to
share a single network device, which is why the original design
correctly allocated it at the instance level. Because my approach
breaks this 1:1 mapping, I will send out a revert series in the next
few days.

Thanks again for catching this and helping track it down!

Regards,
Kuen-Han

[1] https://gitlab.com/postmarketOS/pmaports/-/blob/master/main/postmarketos-initramfs/init_functions.sh?ref_type=heads#L771
Re: [PATCH v2 0/2] usb: gadget: Fix g_ncm regression and atomic sleep in f_ncm
Posted by Greg Kroah-Hartman 1 month, 2 weeks ago
On Mon, Mar 02, 2026 at 06:15:19PM +0800, Kuen-Han Tsai wrote:
> Greg, do you have any thoughts on the best way to untangle this? I am
> fully willing to submit a revert for this series to restore the
> expected behavior while we figure out a proper architectural fix.

Please submit a revert so we can take the time to untangle this.

thanks,

greg k-h