[PATCH] netpoll: normalize skb->dev to the netpoll device

Zhang Cen posted 1 patch 4 weeks ago
There is a newer version of this series
[PATCH] netpoll: normalize skb->dev to the netpoll device
Posted by Zhang Cen 4 weeks ago
__netpoll_send_skb() always transmits through np->dev and queues busy
packets on np->dev->npinfo->txq, but it leaves skb->dev unchanged.
Stacked callers such as DSA and macvlan can reach netpoll with skb->dev
still naming the upper device while np->dev is the lower device that
owns the netpoll state.

If the skb has to be deferred, queue_process() later dequeues it from
the lower device's txq but retries it through skb->dev. That can
re-enter the upper ndo_start_xmit path on an already transformed skb,
and if the upper device disappears before the lower txq drains the
workqueue can dereference a stale skb->dev pointer.

The buggy scenario involves two paths, with each column showing the
order within that path:

path A label: netpoll enqueue path   path B label: upper-device teardown
1. A stacked ndo_start_xmit calls    1. Another task unregisters the
   netpoll_send_skb() on a lower        upper stacked net_device while
   device netpoll instance.            the lower npinfo stays alive.
2. __netpoll_send_skb() uses        2. free_netdev() releases the upper
   np->dev and queues the skb on       device.
   np->dev->npinfo->txq.
3. The queued skb still keeps       3. The lower txq still owns the
   skb->dev pointing at the upper      deferred skb.
   device.
4. queue_process() later dequeues   4. queue_process() dereferences
   from the lower txq and retries      that stale upper skb->dev.
   via skb->dev.

Normalize skb->dev to np->dev before the direct transmit attempt and
before any fallback enqueue. This keeps both the immediate and deferred
netpoll paths in the same device and queue domain that already owns
npinfo->txq.

Sanitizer validation reported:
KASAN slab-use-after-free in queue_process()
Read of size 8
Call trace:
  dump_stack_lvl() (?:?)
  print_report() (?:?)
  srso_alias_return_thunk() (arch/x86/include/asm/nospec-branch.h:375)
  __virt_addr_valid() (?:?)
  kasan_complete_mode_report_info() (?:?)
  kasan_report() (?:?)
  queue_process() (net/core/netpoll.c:88)
  kasan_check_range() (?:?)
  __kasan_check_read() (?:?)
  process_one_work() (kernel/workqueue.c:3200)
  assign_work() (kernel/workqueue.c:1201)
  worker_thread() (?:?)
  kthread() (?:?)
  ret_from_fork() (?:?)
  __switch_to() (?:?)
  __switch_to_asm() (arch/x86/include/asm/switch_to.h:9)
  ret_from_fork_asm() (?:?)
  kasan_save_stack() (mm/kasan/common.c:52)
  kasan_save_track() (mm/kasan/common.c:74)
  kasan_save_free_info() (?:?)
  __kasan_slab_free() (?:?)
  kfree() (?:?)
  kvfree() (mm/slub.c:6876)
  netdev_release() (net/core/net-sysfs.c:2227)
  device_release() (?:?)
  kobject_put() (lib/kobject.c:730)
  put_device() (drivers/base/core.c:3810)
  free_netdev() (net/core/dev.c:12164)
  full_proxy_write() (?:?)
  vfs_write() (fs/read_write.c:668)
  ksys_write() (fs/read_write.c:729)
  __x64_sys_write() (?:?)
  x64_sys_call() (arch/x86/entry/syscall_64.c:35)
  do_syscall_64() (arch/x86/entry/syscall_64.c:87)
  entry_SYSCALL_64_after_hwframe() (?:?)

Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>

---
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -319,6 +319,8 @@
 	lockdep_assert_irqs_disabled();
 
 	dev = np->dev;
+	/* npinfo->txq belongs to np->dev, so retries must stay bound to it. */
+	skb->dev = dev;
 	rcu_read_lock();
 	npinfo = rcu_dereference_bh(dev->npinfo);
Re: [PATCH] netpoll: normalize skb->dev to the netpoll device
Posted by Jakub Kicinski 3 weeks, 3 days ago
On Fri, 15 May 2026 13:05:11 +0800 Zhang Cen wrote:
> Sanitizer validation reported:
> KASAN slab-use-after-free in queue_process()
> Read of size 8
> Call trace:
>   dump_stack_lvl() (?:?)
>   print_report() (?:?)
>   srso_alias_return_thunk() (arch/x86/include/asm/nospec-branch.h:375)
>   __virt_addr_valid() (?:?)
>   kasan_complete_mode_report_info() (?:?)
>   kasan_report() (?:?)
>   queue_process() (net/core/netpoll.c:88)
>   kasan_check_range() (?:?)
>   __kasan_check_read() (?:?)
>   process_one_work() (kernel/workqueue.c:3200)
>   assign_work() (kernel/workqueue.c:1201)
>   worker_thread() (?:?)
>   kthread() (?:?)
>   ret_from_fork() (?:?)
>   __switch_to() (?:?)
>   __switch_to_asm() (arch/x86/include/asm/switch_to.h:9)
>   ret_from_fork_asm() (?:?)
>   kasan_save_stack() (mm/kasan/common.c:52)
>   kasan_save_track() (mm/kasan/common.c:74)
>   kasan_save_free_info() (?:?)
>   __kasan_slab_free() (?:?)
>   kfree() (?:?)
>   kvfree() (mm/slub.c:6876)
>   netdev_release() (net/core/net-sysfs.c:2227)
>   device_release() (?:?)
>   kobject_put() (lib/kobject.c:730)
>   put_device() (drivers/base/core.c:3810)
>   free_netdev() (net/core/dev.c:12164)
>   full_proxy_write() (?:?)
>   vfs_write() (fs/read_write.c:668)
>   ksys_write() (fs/read_write.c:729)
>   __x64_sys_write() (?:?)
>   x64_sys_call() (arch/x86/entry/syscall_64.c:35)
>   do_syscall_64() (arch/x86/entry/syscall_64.c:87)
>   entry_SYSCALL_64_after_hwframe() (?:?)

You trimmed the stack trace too much, the information about 
the object on which the UAF was detected is missing, and 
so is the UAF location.

Please add a Fixes tag (even if it's the first commit in git history).

With that fixed please repost.
-- 
pw-bot: cr
Re: [PATCH] netpoll: normalize skb->dev to the netpoll device
Posted by Cen Zhang 3 weeks, 3 days ago
Hi Jakub,

Thanks for the review.

> You trimmed the stack trace too much, the information about
> the object on which the UAF was detected is missing, and
> so is the UAF location.
>
> Please add a Fixes tag (even if it's the first commit in git history).
>
> With that fixed please repost.

Sorry about that. I will include the missing KASAN object information,
the exact UAF location, and a Fixes tag in v2.

Best regards
Zhang Cen