From nobody Sun Feb 8 01:34:12 2026 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBB8E329392 for ; Tue, 20 Jan 2026 18:45:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768934748; cv=none; b=TIPw5pJylxbaYr1AopJ+JNw/scVNwjChRLHIKCZIr10h5yl+3a4vS+qZGTtgXJF4Pn4+4pn0rKwVA5Rma2xM0immVeaalqeEe+IxSa+mMXgs2heiWWSC5PD7xcGgrVoQKiQdllhqDHnli/wuF2nrApFqhwHQlkWMDIDa25lxCBc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768934748; c=relaxed/simple; bh=GmkD2bpDD1RmZF/vBctcSaU7rAoj2KjBKS6/K+ZOnEw=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=MYs/CiAKRSv/p521CyWJ0eZz0VVXqagvzxP9hpRWhSRLxx7R8anBD++6yhn++ZKzrTSo9KuREhVSGuDL0UFtGtugydgdvZ6jb5SHe5DMbkPGMF20HD2ynKAXqehrb1QUoKSjhqzcu3WQHJV60aesnnMY9KHLSD7vclcXeB6eIlY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FDx649EN; arc=none smtp.client-ip=209.85.208.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FDx649EN" Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-65813e3e215so12405a12.0 for ; Tue, 20 Jan 2026 10:45:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768934744; x=1769539544; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=tnCwlIOyKj3zWftLUKboWMpzvvZ/YNE4NZ10A6cUY+0=; b=FDx649ENdP7GKyZTFn33kaajjS4A87xkHi00aXSUSm/RO4y5g2iPsyrxdU3YbvUH1L IGS6oHQRG/C16+ZrfZIDBw3l7e6sm3KWk8JBckEAjHE1PINxvL7OAeL//zqQj7JTBFLy Io2SOHqhQQIEGTZCsuEktKtkjZa/QidXCcJ1XaqBeMMG/f3CxJ7UuwvA7kRlZLeSTFZS hwAdpIpeoquD7XXIFdW0StLMwCPowaqoetbYLQa78J3k51UYPT5OhWIcXVcRNg74su5b FxzsBHxhYIdHDNja0uffQJ6Y/b+ATqGKz/wjgTWA0qWKEpfh9fKJwhj/k8svjRyuW7/P +PrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768934744; x=1769539544; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tnCwlIOyKj3zWftLUKboWMpzvvZ/YNE4NZ10A6cUY+0=; b=g/SeGmijl1nsYDYbCto+K4DuzE6ttwPGMuzYPokA5CkSsgKs3eR/N0K0R5kYh++bRo QheTjZZMVuAlnSNFCvqJajfTPGVW/7Bedk0zwNzhaOF4jTtKF/uIcpNRBYq40Orpl/eD 92+gDFhW1wONXyMi1oB8noYmQ6mgvpAqPtwo3Z70p7vZiIszEuxBJh//JESW6PX9q85Z Kuj1rq1faSnwyC2fWewljz0B2imu0Eny3o2wWe1PB708iOwUdVvYna49vmCqCVvpMcth fxDLr9Ng7RDI+bSpHiewoCzS02j5AQk71hSVlev/8RVZSRv3ISCGzoY6hCmqbvwgemxj dk8w== X-Forwarded-Encrypted: i=1; AJvYcCV6RURfZn5E+MhvX/0Q1ndZRm5ucqY/Jt5OergQKb2glDWQJTgYydz+eHBXHSqZLrjpsNFO7umm9g71g6A=@vger.kernel.org X-Gm-Message-State: AOJu0Yx2U1e3DjXWpHAYynwg9fM81kR7c+M9kK4+LrAx1CFQwEVTXPmb TcEc9ccvO7FT1xJE8oW/7M28hHsnL6i5zSPxyjQUxNcBAN8UTYxzKp0a X-Gm-Gg: AZuq6aJbPdrruOheMaA0Rj9WKGOvTq9oJw1vwJWSQ/6L42dWZ5TtWCuVcg/iaTsBkj9 oBA0zRPebnMgdPdGw14j+pRF6ttNKTnjJ4Emzw4QJ9t8y4DzZUZTnzinsTUycWUSYUz5fMyFgcJ xeNDVX+keyMvdkusu1/om6GE/1KQO3lRgLcL7IU5tasmHcPOU7J+d+VcOdihYRWVPUX69k77fy2 J6R96zQq1B3i2wI1igv7dQyCOZ4vDjRzF+Fh0rv+5gW2KtbFrDN8wno7MfQ0EPHj/9souq69UTF hYm6Wtb8oDobD5EreYrEyHeAm6FmZiQBoVnZbw7sjKWerzJBoPl3IAA3UzEv1/nbKKNBqNpiu0E A5kswwBNuIROMgCe6Bm4C4i+FeAKSYJivhWLcjo9k8aiPn2bfzvyiFpxX5BnKTnxKp+Zp9R1RHX dytUAokFboPM5syCr1uvNYdHL2C8bUajQ2zqEj53CMQsrcebZGTY5THNJgyZM6Vw== X-Received: by 2002:a17:907:c19:b0:b87:1741:a494 with SMTP id a640c23a62f3a-b87968f5a60mr1169454666b.17.1768934743861; Tue, 20 Jan 2026 10:45:43 -0800 (PST) Received: from f.. (cst-prg-85-136.cust.vodafone.cz. [46.135.85.136]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-658029e060asm2016399a12.21.2026.01.20.10.45.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jan 2026 10:45:43 -0800 (PST) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH] pidfs: implement ino allocation without the pidmap lock Date: Tue, 20 Jan 2026 19:45:39 +0100 Message-ID: <20260120184539.1480930-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This paves the way for scalable PID allocation later. The 32 bit variant merely takes a spinlock for simplicity, the 64 bit variant uses a scalable scheme. Signed-off-by: Mateusz Guzik --- this patch assumes the rb -> rhashtable conversion landed i booted the 32 bit code on the 64 bit kernel, i take it its fine I'm slightly worried about error handling. It seems pid->pidfs_hash.next =3D NULL is supposed to sort it out. Given that ino of 0 is not legal, I think it should be used as a sentinel value for presence in the table instead. so something like: alloc_pid: pid->ino =3D 0; .... then: void pidfs_remove_pid(struct pid *pid) { if (unlikely(!pid->ino)) return; rhashtable_remove_fast(&pidfs_ino_ht, &pid->pidfs_hash, pidfs_ino_ht_params); } fs/pidfs.c | 107 +++++++++++++++++++++++++++++++++++---------------- kernel/pid.c | 3 +- 2 files changed, 74 insertions(+), 36 deletions(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index 3da5e8e0a76b..46b46a484d45 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -65,7 +65,39 @@ static const struct rhashtable_params pidfs_ino_ht_param= s =3D { .automatic_shrinking =3D true, }; =20 +/* + * inode number handling + * + * On 64 bit nothing special happens. The 64bit number assigned + * to struct pid is the inode number. + * + * On 32 bit the 64 bit number assigned to struct pid is split + * into two 32 bit numbers. The lower 32 bits are used as the + * inode number and the upper 32 bits are used as the inode + * generation number. + * + * On 32 bit pidfs_ino() will return the lower 32 bit. When + * pidfs_ino() returns zero a wrap around happened. When a + * wraparound happens the 64 bit number will be incremented by 2 + * so inode numbering starts at 2 again. + * + * On 64 bit comparing two pidfds is as simple as comparing + * inode numbers. + * + * When a wraparound happens on 32 bit multiple pidfds with the + * same inode number are likely to exist (This isn't a problem + * since before pidfs pidfds used the anonymous inode meaning + * all pidfds had the same inode number.). Userspace can + * reconstruct the 64 bit identifier by retrieving both the + * inode number and the inode generation number to compare or + * use file handles. + */ + #if BITS_PER_LONG =3D=3D 32 + +DEFINE_SPINLOCK(pidfs_ino_lock); +static u64 pidfs_ino_nr =3D 2; + static inline unsigned long pidfs_ino(u64 ino) { return lower_32_bits(ino); @@ -77,6 +109,18 @@ static inline u32 pidfs_gen(u64 ino) return upper_32_bits(ino); } =20 +static inline u64 pidfs_alloc_ino(void) +{ + u64 ino; + + spin_lock(&pidfs_ino_lock); + if (pidfs_ino(pidfs_ino_nr) =3D=3D 0) + pidfs_ino_nr +=3D 2; + ino =3D pidfs_ino_nr++; + spin_unlock(&pidfs_ino_lock); + return ino; +} + #else =20 /* On 64 bit simply return ino. */ @@ -90,53 +134,48 @@ static inline u32 pidfs_gen(u64 ino) { return 0; } -#endif =20 /* - * Allocate inode number and initialize pidfs fields. - * Called with pidmap_lock held. + * A patched up copy of get_next_ino(). Uses 64 bit, does not do overflow = checks + * and guarantees ino of at least 2. */ -void pidfs_prepare_pid(struct pid *pid) +#define LAST_INO_BATCH 1024 +static DEFINE_PER_CPU(u64, pidfs_last_ino); + +static u64 pidfs_alloc_ino(void) { - static u64 pidfs_ino_nr =3D 2; + u64 *p =3D &get_cpu_var(pidfs_last_ino); + u64 res =3D *p; + +#ifdef CONFIG_SMP + if (unlikely((res & (LAST_INO_BATCH-1)) =3D=3D 0)) { + static atomic64_t pidfs_shared_last_ino =3D ATOMIC_INIT(2); + u64 next =3D atomic64_add_return(LAST_INO_BATCH, &pidfs_shared_last_ino); + res =3D next - LAST_INO_BATCH; + } +#endif =20 - /* - * On 64 bit nothing special happens. The 64bit number assigned - * to struct pid is the inode number. - * - * On 32 bit the 64 bit number assigned to struct pid is split - * into two 32 bit numbers. The lower 32 bits are used as the - * inode number and the upper 32 bits are used as the inode - * generation number. - * - * On 32 bit pidfs_ino() will return the lower 32 bit. When - * pidfs_ino() returns zero a wrap around happened. When a - * wraparound happens the 64 bit number will be incremented by 2 - * so inode numbering starts at 2 again. - * - * On 64 bit comparing two pidfds is as simple as comparing - * inode numbers. - * - * When a wraparound happens on 32 bit multiple pidfds with the - * same inode number are likely to exist (This isn't a problem - * since before pidfs pidfds used the anonymous inode meaning - * all pidfds had the same inode number.). Userspace can - * reconstruct the 64 bit identifier by retrieving both the - * inode number and the inode generation number to compare or - * use file handles. - */ - if (pidfs_ino(pidfs_ino_nr) =3D=3D 0) - pidfs_ino_nr +=3D 2; + res++; + *p =3D res; + put_cpu_var(pidfs_last_ino); + return res; +} + +#endif =20 - pid->ino =3D pidfs_ino_nr; +/* + * Initialize pidfs fields. + */ +void pidfs_prepare_pid(struct pid *pid) +{ pid->pidfs_hash.next =3D NULL; pid->stashed =3D NULL; pid->attr =3D NULL; - pidfs_ino_nr++; } =20 int pidfs_add_pid(struct pid *pid) { + pid->ino =3D pidfs_alloc_ino(); return rhashtable_insert_fast(&pidfs_ino_ht, &pid->pidfs_hash, pidfs_ino_ht_params); } diff --git a/kernel/pid.c b/kernel/pid.c index 06356e40ac00..72c9372b84b8 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -198,6 +198,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *= arg_set_tid, INIT_HLIST_HEAD(&pid->tasks[type]); init_waitqueue_head(&pid->wait_pidfd); INIT_HLIST_HEAD(&pid->inodes); + pidfs_prepare_pid(pid); =20 /* * 2. perm check checkpoint_restore_ns_capable() @@ -314,8 +315,6 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *= arg_set_tid, retval =3D -ENOMEM; if (unlikely(!(ns->pid_allocated & PIDNS_ADDING))) goto out_free; - pidfs_prepare_pid(pid); - for (upid =3D pid->numbers + ns->level; upid >=3D pid->numbers; --upid) { /* Make the PID visible to find_pid_ns. */ idr_replace(&upid->ns->idr, pid, upid->nr); --=20 2.48.1