From nobody Sun Apr 5 22:53:48 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0FE8C54EE9 for ; Tue, 13 Sep 2022 12:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232184AbiIMMvT (ORCPT ); Tue, 13 Sep 2022 08:51:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231575AbiIMMvI (ORCPT ); Tue, 13 Sep 2022 08:51:08 -0400 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4D75222BC for ; Tue, 13 Sep 2022 05:51:02 -0700 (PDT) Received: from pps.filterd (m0279871.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28DCLHKd015280; Tue, 13 Sep 2022 12:50:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=qcppdkim1; bh=qr9k6CbIr16Izhen6+Bzu/5SAmfr275ch08XiOzawI4=; b=Se/+aMb71jofE70OLfIA84psrUq8XH/DsEntivV13TFH+9E8290i1VVr7h9UBDs21qUV 0EJ+n0CE+vCbzmbjGgGoIZsU3Anw5KUJ4Ll0wHXjzx68kzx0fegfBQd1PYWELrLlDjUO MMyVwQYUyVSXA+mV5nV5j2Hz/RPVcH+83HoO7++24TttzD+wCDRaswb3qduE12pLaWKz HknnPMhc+QLrk1/hscL2a7tLpOC7wb25QUKlsJTQDYsG35mkhbonM2+faLTMZtyA0f5I iSZqwRyQwodlbuvhhYvYKXwqq2o+N20aS6C3Gxz/pabKyYo+mjeOZWJhVt8FCSh69MUn zw== Received: from nasanppmta03.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3jjn0sh31e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Sep 2022 12:50:40 +0000 Received: from nasanex01c.na.qualcomm.com (nasanex01c.na.qualcomm.com [10.45.79.139]) by NASANPPMTA03.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 28DCod3X017107 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Sep 2022 12:50:39 GMT Received: from hu-mojha-hyd.qualcomm.com (10.80.80.8) by nasanex01c.na.qualcomm.com (10.45.79.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Tue, 13 Sep 2022 05:50:37 -0700 From: Mukesh Ojha To: , CC: , , , , , Mukesh Ojha Subject: [RESEND PATCH v5] devcoredump : Serialize devcd_del work Date: Tue, 13 Sep 2022 18:20:24 +0530 Message-ID: <1663073424-13663-1-git-send-email-quic_mojha@quicinc.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nasanex01c.na.qualcomm.com (10.45.79.139) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: 5rnpxR8eF15B0iJ7nz1ntvOPBVs-7Nj0 X-Proofpoint-ORIG-GUID: 5rnpxR8eF15B0iJ7nz1ntvOPBVs-7Nj0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-13_05,2022-09-13_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 spamscore=0 lowpriorityscore=0 mlxlogscore=999 impostorscore=0 adultscore=0 priorityscore=1501 clxscore=1011 phishscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2209130057 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In following scenario(diagram), when one thread X running dev_coredumpm() adds devcd device to the framework which sends uevent notification to userspace and another thread Y reads this uevent and call to devcd_data_write() which eventually try to delete the queued timer that is not initialized/queued yet. So, debug object reports some warning and in the meantime, timer is initialized and queued from X path. and from Y path, it gets reinitialized again and timer->entry.pprev=3DNULL and try_to_grab_pending() stucks. To fix this, introduce mutex and a boolean flag to serialize the behaviour. cpu0(X) cpu1(Y) dev_coredump() uevent sent to user space device_add() =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D> user space process Y reads the uevents writes to devcd fd which results into writes to devcd_data_write() mod_delayed_work() try_to_grab_pending() del_timer() debug_assert_init() INIT_DELAYED_WORK() schedule_delayed_work() debug_object_fixup() timer_fixup_assert_ini= t() timer_setup() do_init_timer() /* Above call reinitia= lizes the timer to timer->entry.pprev= =3DNULL and this will be ch= ecked later in timer_pend= ing() call. */ timer_pending() !hlist_unhashed_lockless(= &timer->entry) !h->pprev /* del_timer() checks h->ppr= ev and finds it to be NULL due to which try_to_grab_pending() stu= cks. */ Link: https://lore.kernel.org/lkml/2e1f81e2-428c-f11f-ce92-eb11048cb271@qui= cinc.com/ Signed-off-by: Mukesh Ojha --- Hi, Resending it after rebase. There is no change compare to last v5 version. v4->v5: - Rebased it. v3->v4: - flg variable renamed to delete_work. v2->v3: Addressed comments from gregkh - Wrapped the commit text and corrected the alignment. - Described the reason to introduce new variables. - Restored the blank line. - rename the del_wk_queued to flg. Addressed comments from tglx - Added a comment which explains the race which looks obvious however would not occur between disabled_store and devcd_del work. v1->v2: - Added del_wk_queued flag to serialize the race between devcd_data_write() and disabled_store() =3D> devcd_free(). drivers/base/devcoredump.c | 83 ++++++++++++++++++++++++++++++++++++++++++= ++-- 1 file changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c index f4d794d..1c06781 100644 --- a/drivers/base/devcoredump.c +++ b/drivers/base/devcoredump.c @@ -25,6 +25,47 @@ struct devcd_entry { struct device devcd_dev; void *data; size_t datalen; + /* + * Here, mutex is required to serialize the calls to del_wk work between + * user/kernel space which happens when devcd is added with device_add() + * and that sends uevent to user space. User space reads the uevents, + * and calls to devcd_data_write() which try to modify the work which is + * not even initialized/queued from devcoredump. + * + * + * + * cpu0(X) cpu1(Y) + * + * dev_coredump() uevent sent to user space + * device_add() =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D> user space process Y reads the + * uevents writes to devcd fd + * which results into writes= to + * + * devcd_data_write() + * mod_delayed_work() + * try_to_grab_pending() + * del_timer() + * debug_assert_init() + * INIT_DELAYED_WORK() + * schedule_delayed_work() + * + * + * Also, mutex alone would not be enough to avoid scheduling of + * del_wk work after it get flush from a call to devcd_free() + * mentioned as below. + * + * disabled_store() + * devcd_free() + * mutex_lock() devcd_data_write() + * flush_delayed_work() + * mutex_unlock() + * mutex_lock() + * mod_delayed_work() + * mutex_unlock() + * So, delete_work flag is required. + */ + struct mutex mutex; + bool delete_work; struct module *owner; ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen); @@ -84,7 +125,12 @@ static ssize_t devcd_data_write(struct file *filp, stru= ct kobject *kobj, struct device *dev =3D kobj_to_dev(kobj); struct devcd_entry *devcd =3D dev_to_devcd(dev); =20 - mod_delayed_work(system_wq, &devcd->del_wk, 0); + mutex_lock(&devcd->mutex); + if (!devcd->delete_work) { + devcd->delete_work =3D true; + mod_delayed_work(system_wq, &devcd->del_wk, 0); + } + mutex_unlock(&devcd->mutex); =20 return count; } @@ -112,7 +158,12 @@ static int devcd_free(struct device *dev, void *data) { struct devcd_entry *devcd =3D dev_to_devcd(dev); =20 + mutex_lock(&devcd->mutex); + if (!devcd->delete_work) + devcd->delete_work =3D true; + flush_delayed_work(&devcd->del_wk); + mutex_unlock(&devcd->mutex); return 0; } =20 @@ -122,6 +173,30 @@ static ssize_t disabled_show(struct class *class, stru= ct class_attribute *attr, return sysfs_emit(buf, "%d\n", devcd_disabled); } =20 +/* + * + * disabled_store() worker() + * class_for_each_device(&devcd_class, + * NULL, NULL, devcd_free) + * ... + * ... + * while ((dev =3D class_dev_iter_next(&iter)) + * devcd_del() + * device_de= l() + * put_dev= ice() <- last reference + * error =3D fn(dev, data) devcd_dev= _release() + * devcd_free(dev, data) kfree(devcd) + * mutex_lock(&devcd->mutex); + * + * + * In the above diagram, It looks like disabled_store() would be racing wi= th parallely + * running devcd_del() and result in memory abort while acquiring devcd->m= utex which + * is called after kfree of devcd memory after dropping its last referenc= e with + * put_device(). However, this will not happens as fn(dev, data) runs + * with its own reference to device via klist_node so it is not its last r= eference. + * so, above situation would not occur. + */ + static ssize_t disabled_store(struct class *class, struct class_attribute = *attr, const char *buf, size_t count) { @@ -278,13 +353,16 @@ void dev_coredumpm(struct device *dev, struct module = *owner, devcd->read =3D read; devcd->free =3D free; devcd->failing_dev =3D get_device(dev); + devcd->delete_work =3D false; =20 + mutex_init(&devcd->mutex); device_initialize(&devcd->devcd_dev); =20 dev_set_name(&devcd->devcd_dev, "devcd%d", atomic_inc_return(&devcd_count)); devcd->devcd_dev.class =3D &devcd_class; =20 + mutex_lock(&devcd->mutex); if (device_add(&devcd->devcd_dev)) goto put_device; =20 @@ -301,10 +379,11 @@ void dev_coredumpm(struct device *dev, struct module = *owner, =20 INIT_DELAYED_WORK(&devcd->del_wk, devcd_del); schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT); - + mutex_unlock(&devcd->mutex); return; put_device: put_device(&devcd->devcd_dev); + mutex_unlock(&devcd->mutex); put_module: module_put(owner); free: --=20 2.7.4