From nobody Tue Dec 2 01:28:38 2025 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E7413370F2 for ; Fri, 21 Nov 2025 07:28:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763710107; cv=none; b=nNzvY4qW6NzFWtacrfTPwka1PXNoStvPCy1m2+4Avzz4J0iogghX26JeE22yPXXLz2e4Ilp48mmhr2RqBVfsfaCkpRqhW59Q7XjfNXEH6ho1fDO5AMYWtww/PoUXFmzdC3eC9z5TBhKHCqadXdWYbRPpyR/vKyMs8Ai3L0S7hJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763710107; c=relaxed/simple; bh=g2i7/R56UTTVw1U5aYaOMR7BZAoaPIzLKYPX09tjRvA=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=SZpRDJgfYmhD0x9D2P1wSjyE6oKA87yijk127jITUdwHMbXi+R53+pAv77SU9dIFtHntLPssSqbHiHEi5NVZG+u3fpURdwP6GMA64pRoIW2mElW83VZ0O3fV6haxbLz4P6pOOkJ9qbrIknZUyQasWydU7nbRLahPlQETjDf2p1o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iLqmX3dw; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iLqmX3dw" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-477ba2c1ca2so15521415e9.2 for ; Thu, 20 Nov 2025 23:28:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763710104; x=1764314904; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=U9BYbALvRt6Xy+A7LeU1HJSCKjZwuwVgs7ggynkH6+Y=; b=iLqmX3dw+4/CL/DJM3cnszcm4t3roFzakRQE6XQ5Athf5gLoh4EenosLNnSaTCGx8H VhJNudDQzeUSf15XaNu1dICbsFiBjGYVwoCEqVq3EqtpVhy0TqLKre+LihGO3nx3COLP CAZ/rL8Fh6ZJNRXpY84zqmUCrMQMyuVZE4A+vo/z9iVqPUDQv3c+rElLNFwnpXjD7ZAf MZMiRcJSXdcRoEWCKB2RKWuiC87jf/pQmhR95OvDNVS21KbHopaN+8pa+pgPd5HXH7WE dd7wscy24AlHz1G/GVKK1ey6YSKWK/ikbrNNMJtzzizINe0SH8+3c2jA1n3mM1KY0Tsz VbRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763710104; x=1764314904; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=U9BYbALvRt6Xy+A7LeU1HJSCKjZwuwVgs7ggynkH6+Y=; b=PMDq+WZf4nUGUhOKvASAgWkgMSkiNTU2Dqrj3sDhm3BJiI3zpAlkod0N8cIySoJ+ge clDpTMetYPbSzauDW9GVdi0pUSeXRQr2JZDgEdog39sJHcvXqynjytE9ytdISPTC8EIX FWKTR3EByFXYpoIvTEd9LFnvZazy0xvYF4HcGLdlfCAJCa8RYFb9AgxL8sH2kPcs8brz V4VfQ8SHU6rVLm3l56nQ1lF05LR1zKYo5RekReCka8y5R8hwoQrHN/qh5JgLU6wVQu7c tu3gropFZxYTMSaovtrWirnXhIJ8CIbNFA6LUd+pUZlUWX8gBhhM2ocZ5iHHffe/56Lg DqCw== X-Forwarded-Encrypted: i=1; AJvYcCWTaK+6suVPV4VhjblDpEdibnUueUHo5h2ucrQNUo40j12zNZdx1zdsjBjneiD+9KlZNDtThIZecprdS0s=@vger.kernel.org X-Gm-Message-State: AOJu0YyhrAEeeHgCH/NtjugX8ykerY8R9ehqAj+t5GyrUcbx1vPinEAG a7oSqUamcLXeVzUxaO8K5AvLRH3Xt9wMw1LRgU9Wmn8RusZETq/mNjRp X-Gm-Gg: ASbGncvGHg8egCFXUQfHkZcBgis8zNZgTvqE0xj89DgqAXXmj11FeZQAoGIZFlXOTkS aImGILm8Th+iKJWvDzGhee+pkpkvuk/BVfOb3gHOJQI3zP2m+T5sMrIgRpD3428DgKP7WqUID+R rd0diu81UOv1oezhhMAX5aF0J2toBc6OhG1bcrLiaXDdGpjagZXmx9rnYncLkItryg9NouNAXYa R7YdagBZD1M/qmTV7T93DxBmgtCNQiB4nh8n5JzRsrREPTHr2ri106gtF+qy0rsj8zDE+n99vYl 1wsKcfQviUDeGgFIC60d3d/s9sJxT5GQURx/1ay/JrivGnt4F9AXnComJ71h39AxX7jEYWI50E7 6XLQ7YeexwboInEMXj/GsmQCt+4H90c7UcwfM9I3HNmT1vOmqa0RQ2S/epehqlXMqy2undsluEO R4ZZ7FsJoQy3e3eg0/m1UlLBcYLwx+WDAUAxgR3EAZftARESWkozRyPigcJ2E= X-Google-Smtp-Source: AGHT+IEImYE9q0tBVy5yyzYaN4QGnkFSvX0iXlmg7xc7ci3f9g+U3un/GHZg4Q+0VmEhxh45VWZBVA== X-Received: by 2002:a05:600c:4f48:b0:46e:761b:e7ff with SMTP id 5b1f17b1804b1-477c01e02e5mr11388795e9.28.1763710103890; Thu, 20 Nov 2025 23:28:23 -0800 (PST) Received: from f.. (cst-prg-14-82.cust.vodafone.cz. [46.135.14.82]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-477bf3b4fafsm28868915e9.14.2025.11.20.23.28.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Nov 2025 23:28:23 -0800 (PST) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v2] fs: scale opening of character devices Date: Fri, 21 Nov 2025 08:28:18 +0100 Message-ID: <20251121072818.3230541-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" chrdev_open() always takes cdev_lock, which is only needed to synchronize against cd_forget(). But the latter is only ever called by inode evict(), meaning these two can never legally race. Solidify this with asserts. More cleanups are needed here but this is enough to get the thing out of the way. Rationale is funny-sounding at first: opening of /dev/zero happens to be a contention point in large-scale package building (think 100+ packages at the same with a thread count to support it). Such a workload is not only very fork+exec heavy, but frequently involves scripts which use the idiom of silencing output by redirecting it to /dev/null. A non-large-scale microbenchmark of opening /dev/null in a loop in 16 processes: before: 2865472 after: 4011960 (+40%) Code goes from being bottlenecked on the spinlock to being bottlenecked on lockref. Signed-off-by: Mateusz Guzik --- v2: - add back new =3D NULL lost in refactoring I'll note for interested my experience with the workload at hand comes from FreeBSD and was surprised to find /dev/null on the profile. Given that Linux is globally serializing on it, it has to be a factor as well in this case. fs/char_dev.c | 20 +++++++++++--------- fs/inode.c | 2 +- include/linux/cdev.h | 2 +- 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/char_dev.c b/fs/char_dev.c index c2ddb998f3c9..9a6dfab084d1 100644 --- a/fs/char_dev.c +++ b/fs/char_dev.c @@ -374,15 +374,15 @@ static int chrdev_open(struct inode *inode, struct fi= le *filp) { const struct file_operations *fops; struct cdev *p; - struct cdev *new =3D NULL; int ret =3D 0; =20 - spin_lock(&cdev_lock); - p =3D inode->i_cdev; + VFS_BUG_ON_INODE(icount_read(inode) < 1, inode); + + p =3D READ_ONCE(inode->i_cdev); if (!p) { struct kobject *kobj; + struct cdev *new =3D NULL; int idx; - spin_unlock(&cdev_lock); kobj =3D kobj_lookup(cdev_map, inode->i_rdev, &idx); if (!kobj) return -ENXIO; @@ -392,19 +392,19 @@ static int chrdev_open(struct inode *inode, struct fi= le *filp) we dropped the lock. */ p =3D inode->i_cdev; if (!p) { - inode->i_cdev =3D p =3D new; + p =3D new; + WRITE_ONCE(inode->i_cdev, p); list_add(&inode->i_devices, &p->list); new =3D NULL; } else if (!cdev_get(p)) ret =3D -ENXIO; + spin_unlock(&cdev_lock); + cdev_put(new); } else if (!cdev_get(p)) ret =3D -ENXIO; - spin_unlock(&cdev_lock); - cdev_put(new); if (ret) return ret; =20 - ret =3D -ENXIO; fops =3D fops_get(p->ops); if (!fops) goto out_cdev_put; @@ -423,8 +423,10 @@ static int chrdev_open(struct inode *inode, struct fil= e *filp) return ret; } =20 -void cd_forget(struct inode *inode) +void inode_cdev_forget(struct inode *inode) { + VFS_BUG_ON_INODE(!(inode_state_read_once(inode) & I_FREEING), inode); + spin_lock(&cdev_lock); list_del_init(&inode->i_devices); inode->i_cdev =3D NULL; diff --git a/fs/inode.c b/fs/inode.c index a62032864ddf..88be1f20782d 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -840,7 +840,7 @@ static void evict(struct inode *inode) clear_inode(inode); } if (S_ISCHR(inode->i_mode) && inode->i_cdev) - cd_forget(inode); + inode_cdev_forget(inode); =20 remove_inode_hash(inode); =20 diff --git a/include/linux/cdev.h b/include/linux/cdev.h index 0e8cd6293deb..bed99967ad90 100644 --- a/include/linux/cdev.h +++ b/include/linux/cdev.h @@ -34,6 +34,6 @@ void cdev_device_del(struct cdev *cdev, struct device *de= v); =20 void cdev_del(struct cdev *); =20 -void cd_forget(struct inode *); +void inode_cdev_forget(struct inode *); =20 #endif --=20 2.48.1