From nobody Wed Feb 11 17:55:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 788E7C77B75 for ; Sat, 6 May 2023 00:16:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231989AbjEFAQc (ORCPT ); Fri, 5 May 2023 20:16:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231886AbjEFAQa (ORCPT ); Fri, 5 May 2023 20:16:30 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2380B6E8A; Fri, 5 May 2023 17:16:29 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-55a214572e8so37257207b3.0; Fri, 05 May 2023 17:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683332188; x=1685924188; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S7AI48GyyCwGO0uPw6kpNL/21jqgbbM33iUu+eDBZ4Y=; b=EMcW+G1OGv1cOOhvkhP8d6xuW0h2Ccsp91rcxT9RtS2cVk3Xsrg5AHGRQVwn5v7uR3 G3IkuC64sXmHBAONIWuujF6i3FldimDXWvD0eCHna6m3iLYDrjRgWUulqsHd25DSJi2c wC/3AYCSn4HHBJAIxLVaHvcuW5XQbJcXdnem8NYkVGWpqVNWpS3yB5Ga0Vu9ne8l32rX FtJkwake/5P5KOeNGHZTciZQcyzKkFnbg3KqlyfIhG19nnU0Gm7f+C5lHbuz86I1R8fW OUJl5MEv4hRNEGLuly8di+Wr5jUHQUsoIn3b0DEvToT/e5ix8SK2B8XyBm4VcyzX6ISr A0NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683332188; x=1685924188; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S7AI48GyyCwGO0uPw6kpNL/21jqgbbM33iUu+eDBZ4Y=; b=CgjERmSz6hzMDyO3rDy6oAU/u7L7XGAe4xDi141DnTdQCt4no2Cpx3oN8kODKbakIs qxWUXJFGHEP6pgjy82xSRuK95+5ThbV/FbSqawTJnnx9yh4uLz1cwlNO+QFTm67QQh0b Yn3C9OVrq+Q+gIGexL2zfmfc59FaX2iFajc0TN8WjK4MJETn0pP/5unhVa/lLw6rrVYY rE/Il8JeJM+Wd70AAccpmv8oacRPMLHNV/wxi5izs2PbXV9dkhOWG19k5H9ixrrwWt3/ nEXc08orfVKcsi2BQ06QLN2P54BgDv9Jo6jH1sLTa4CmVv0B+hhhYFk8n45UmxLYVdYL pa0g== X-Gm-Message-State: AC+VfDyWiYG4RNXCUegx5aKrzxF5uaSv8Wx6n8xrg2s9j+HLLXxYupnU DTrsMrAUV6qzqtEmRObdCg== X-Google-Smtp-Source: ACHHUZ4zdUDzHfhRnEE8lUFOSoNSlVoZRD6fM4EDhug/KmWGO8995HVu2NP3kIbK6PioL1t/Y3R6Kw== X-Received: by 2002:a81:4a0a:0:b0:55a:40d3:4d6f with SMTP id x10-20020a814a0a000000b0055a40d34d6fmr3640675ywa.26.1683332188225; Fri, 05 May 2023 17:16:28 -0700 (PDT) Received: from C02FL77VMD6R.attlocal.net ([2600:1700:d860:12b0:5c3e:e69d:d939:4053]) by smtp.gmail.com with ESMTPSA id n82-20020a0dcb55000000b00559be540b56sm801631ywd.134.2023.05.05.17.16.26 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 May 2023 17:16:27 -0700 (PDT) From: Peilin Ye X-Google-Original-From: Peilin Ye To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: Peilin Ye , Daniel Borkmann , John Fastabend , Vlad Buslov , Pedro Tammela , Hillf Danton , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , Peilin Ye Subject: [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Date: Fri, 5 May 2023 17:16:10 -0700 Message-Id: X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs access this per-net_device pointer in mini_qdisc_pair_swap(). Similar for clsact Qdiscs and miniq_egress. Unfortunately, after introducing RTNL-lockless RTM_{NEW,DEL,GET}TFILTER requests, when e.g. replacing ingress (clsact) Qdiscs, the old Qdisc could access the same miniq_{in,e}gress pointer(s) concurrently with the new Qdisc, causing race conditions [1] including a use-after-free in mini_qdisc_pair_swap() reported by syzbot: BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sc= hed/sch_generic.c:1573 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 ... Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 print_report mm/kasan/report.c:430 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:536 mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 ... The new (ingress or clsact) Qdisc should only call mini_qdisc_pair_swap() after the old Qdisc's last call (in {ingress,clsact}_destroy()) has finished. To achieve this, in qdisc_graft(), return -EBUSY if the old (ingress or clsact) Qdisc has ongoing RTNL-lockless filter requests, and call qdisc_destroy() for "old" before grafting "new". Introduce qdisc_refcount_dec_if_one() as the counterpart of qdisc_refcount_inc_nz() used for RTNL-lockless filter requests. Introduce a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, just like qdisc_put() etc. [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under TC_H_ROOT (no longer possible after patch "net/sched: sch_ingress: Only create under TC_H_INGRESS") on eth0 that has 8 transmission queues: Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then adds a flower filter X to A. Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and b2) to replace A, then adds a flower filter Y to B. Thread 1 A's refcnt Thread 2 RTM_NEWQDISC (A, RTNL-locked) qdisc_create(A) 1 qdisc_graft(A) 9 RTM_NEWTFILTER (X, RTNL-lockless) __tcf_qdisc_find(A) 10 tcf_chain0_head_change(A) mini_qdisc_pair_swap(A) (1st) | | RTM_NEWQDISC (B, RTNL-locked) RCU 2 qdisc_graft(B) | 1 notify_and_destroy(A) | tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-lockless) qdisc_destroy(A) tcf_chain0_head_change(B) tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) mini_qdisc_pair_swap(A) (3rd) | ... ... Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets on eth0 will not find filter Y in sch_handle_ingress(). This is just one of the possible consequences of concurrently accessing net_device::miniq_{in,e}gress pointers. The point is clear, however: B's first call to mini_qdisc_pair_swap() should take place after A's last call, in qdisc_destroy(). Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc op= s") Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Q= disc ops") Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com Link: https://lore.kernel.org/netdev/0000000000006cf87705f79acf1a@google.com Cc: Hillf Danton Signed-off-by: Peilin Ye Acked-by: Jamal Hadi Salim Reviewed-by: Jamal Hadi Salim Tested-by: Pedro Tammela --- include/net/sch_generic.h | 8 ++++++++ net/sched/sch_api.c | 26 +++++++++++++++++++++----- net/sched/sch_generic.c | 14 +++++++++++--- 3 files changed, 40 insertions(+), 8 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index fab5ba3e61b7..3e9cc43cbc90 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -137,6 +137,13 @@ static inline void qdisc_refcount_inc(struct Qdisc *qd= isc) refcount_inc(&qdisc->refcnt); } =20 +static inline bool qdisc_refcount_dec_if_one(struct Qdisc *qdisc) +{ + if (qdisc->flags & TCQ_F_BUILTIN) + return true; + return refcount_dec_if_one(&qdisc->refcnt); +} + /* Intended to be used by unlocked users, when concurrent qdisc release is * possible. */ @@ -652,6 +659,7 @@ void dev_deactivate_many(struct list_head *head); struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, struct Qdisc *qdisc); void qdisc_reset(struct Qdisc *qdisc); +void qdisc_destroy(struct Qdisc *qdisc); void qdisc_put(struct Qdisc *qdisc); void qdisc_put_unlocked(struct Qdisc *qdisc); void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index f72a581666a2..a2d07bc8ded6 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1080,10 +1080,20 @@ static int qdisc_graft(struct net_device *dev, stru= ct Qdisc *parent, if ((q && q->flags & TCQ_F_INGRESS) || (new && new->flags & TCQ_F_INGRESS)) { ingress =3D 1; - if (!dev_ingress_queue(dev)) { + dev_queue =3D dev_ingress_queue(dev); + if (!dev_queue) { NL_SET_ERR_MSG(extack, "Device does not have an ingress queue"); return -ENOENT; } + + /* This is the counterpart of that qdisc_refcount_inc_nz() call in + * __tcf_qdisc_find() for RTNL-lockless filter requests. + */ + if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping)) { + NL_SET_ERR_MSG(extack, + "Current ingress or clsact Qdisc has ongoing filter request(s)= "); + return -EBUSY; + } } =20 if (dev->flags & IFF_UP) @@ -1104,8 +1114,16 @@ static int qdisc_graft(struct net_device *dev, struc= t Qdisc *parent, qdisc_put(old); } } else { - dev_queue =3D dev_ingress_queue(dev); - old =3D dev_graft_qdisc(dev_queue, new); + old =3D dev_graft_qdisc(dev_queue, NULL); + + /* {ingress,clsact}_destroy() "old" before grafting "new" to avoid + * unprotected concurrent accesses to net_device::miniq_{in,e}gress + * pointer(s) in mini_qdisc_pair_swap(). + */ + qdisc_notify(net, skb, n, classid, old, new, extack); + qdisc_destroy(old); + + dev_graft_qdisc(dev_queue, new); } =20 skip: @@ -1119,8 +1137,6 @@ static int qdisc_graft(struct net_device *dev, struct= Qdisc *parent, =20 if (new && new->ops->attach) new->ops->attach(new); - } else { - notify_and_destroy(net, skb, n, classid, old, new, extack); } =20 if (dev->flags & IFF_UP) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 37e41f972f69..e14ed47f961c 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1046,7 +1046,7 @@ static void qdisc_free_cb(struct rcu_head *head) qdisc_free(q); } =20 -static void qdisc_destroy(struct Qdisc *qdisc) +static void __qdisc_destroy(struct Qdisc *qdisc) { const struct Qdisc_ops *ops =3D qdisc->ops; =20 @@ -1070,6 +1070,14 @@ static void qdisc_destroy(struct Qdisc *qdisc) call_rcu(&qdisc->rcu, qdisc_free_cb); } =20 +void qdisc_destroy(struct Qdisc *qdisc) +{ + if (qdisc->flags & TCQ_F_BUILTIN) + return; + + __qdisc_destroy(qdisc); +} + void qdisc_put(struct Qdisc *qdisc) { if (!qdisc) @@ -1079,7 +1087,7 @@ void qdisc_put(struct Qdisc *qdisc) !refcount_dec_and_test(&qdisc->refcnt)) return; =20 - qdisc_destroy(qdisc); + __qdisc_destroy(qdisc); } EXPORT_SYMBOL(qdisc_put); =20 @@ -1094,7 +1102,7 @@ void qdisc_put_unlocked(struct Qdisc *qdisc) !refcount_dec_and_rtnl_lock(&qdisc->refcnt)) return; =20 - qdisc_destroy(qdisc); + __qdisc_destroy(qdisc); rtnl_unlock(); } EXPORT_SYMBOL(qdisc_put_unlocked); --=20 2.20.1