From nobody Mon Jun 8 13:29:45 2026 Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3EAE362139 for ; Fri, 29 May 2026 03:19:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024765; cv=none; b=Zv/+o1LS5WZxXkqH43zi8pz0jFtjdfgTLuFYIP5Wku5tCDgNQnVE2jMH74ESXm70lADcAbkAF5Fd8gp1Th25+y6ZacJJWh6fIQ/0kY/1ZJ/8Igjmya+upK4YtNTL5BDw7632BLuer/1T//Q/uFNplKL+fMZcUjeOGUMe3iU5RgQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024765; c=relaxed/simple; bh=40T2GLBqj4FpYx2f+Q5O8Te1Fo7OcJyPDzJ1RlcCGmM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=io3CQC4mLpJtedRFSMdleraaAVK9F7i5rN5g/lMaQiYuYixMRgonKEuC4sHWFaqz3P1mAiranXmvIvkV9qbdmLso/Sy51iwN029sbLd/nCRpdgI6HIuvOlS7nu3Z5/Ki7SyGeBN83kDRh4WiQgOaWWrUdUNBcAh2SbTgSyshARc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=V99MNcJx; arc=none smtp.client-ip=209.85.167.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V99MNcJx" Received: by mail-oi1-f180.google.com with SMTP id 5614622812f47-47c7b282e21so5285011b6e.1 for ; Thu, 28 May 2026 20:19:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780024763; x=1780629563; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o9Db9daSum/vcEWTe0jIogWuT23gsSW5PUBP5KH0NAU=; b=V99MNcJxDllHIiIlVvlfYNZaYIxV/vgbwNdLncReVf2RD5vEgR6Zhl8a1k61Buv2xM 7Ns2p2a7fglX4/jVzyS4RUvab+46F+gpImkFUDDETE/e1o5cDz+BkR9OPHWgaUk+sgh2 sb8P4b5BUW1wWYvEmHcu7R5gHOn8/pwohfdVyw4DzJ87hzQkujqct9Me1m3hN3k7pgFY Czeg9Mvl0jSLiXkWPgWDZ5gKWjZHp/b5du7Y5+djc1p9fv7+XEvBtdH0fSIDCkG16Y3l 8My/Lk38kgrO9gvhYnKYr0U85/Ek2J8zQZF/lqBg5wjqgbiAqe4+wfO+XrqAe2W2DVpv uihQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780024763; x=1780629563; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=o9Db9daSum/vcEWTe0jIogWuT23gsSW5PUBP5KH0NAU=; b=nOJZuSWuyR114DQ/VV78gqIWdiNk2ojZfYYU8Qi5lSAoOQDOHz8tABKiW4FHGqMdlF c8X00kvUE5riG/N9UKTxpJgXA/Mt7QxxfFRv1MO19bkJxhdYtcO/LMSv43FmjzyhR6+Z hHiDU3QwTEKcYhLF0YiIe6ysRilJPz93wdsL+QsXET8bywV8JQAR8biSOHJPGb7aW3Vd Bg1Q8tLiJ5cCF9P3j1ubIWqiQ8Vj8TJ8utjjXXjH4y7/NEUPz/fzlC9jd2Yl5lzkAfKP 0TrvjORRFNjUUsved7s09x+H+43Gxgpf3/kvqM3atFBaHHqQEF36CxtIp0C2mD2s4wBz IUeQ== X-Forwarded-Encrypted: i=1; AFNElJ88052nM7NVdtNCrUfFYtdF9XAZ4tdLf/4TmEobp3gtsmajaQpyYqFkEk4/c60R87HLRFCvLxJKuKSJDAk=@vger.kernel.org X-Gm-Message-State: AOJu0YwCWdjpXQ/bUU50wIbqZPSFBpPxRyLWSkJFnw7eEV5DHQkYXHPY uLAoUis0FLBayOIAI3OPbIqe2+F0iqHMqJn4ekHmj/KsceA7uNgGE0H6 X-Gm-Gg: Acq92OE1pSUCwqTkoDEULvj5+CxV3HiqkUgiRCrQelj1tb58Pskdiu+WYXYDnEMnpM0 L8iiHjPUlg0GUfbT7c76v3UVBrr/1vs7rIAygLV7/8kTgQCpfl0LEe7jMKqbfhhbTptzmvuIbdU RtU3MI1M5nBEq2eITRNB9WzXubPJAYRulnBGJh4LL2UrC0CQtbkEk+HTYNSeUvVs+Bk2mP1VjxW yiG3XumkHpcN5KGPOEAZtQuJ6M46ABMWrPfOvvlMMTyBkFgZuAHUnGCG7yZJOBFnOXkvEW9X/BN uuikEluoROmAfV/0ShtD4gHJ1SJ+tZPVBSAUi9sAV9iYHjSdhKO65TaBPQcCA1UXfB42U1SI3Zs MB/BDsbK/zGHZPu0FbyjNx/Ab1BSxJuuaJhXd8YYoivh9E376/x2FidEmAaEOW3tSGeeYQ1WLnm 15BssEIx7GWX4WMrTA0C56rlA+T90IvqjxV/aZA8at7zaAk6rgRVmZ9/Q9p8ACQN2D39N6xJ8ZQ aNBwTiueVHX4XdMQfR3T4pLA2d3b57pBVpnZIjISA5HGXqwJ83iwPi7jcGFpILiCA== X-Received: by 2002:a05:6808:3318:b0:485:467f:a307 with SMTP id 5614622812f47-485e6c9fd72mr901051b6e.42.1780024762736; Thu, 28 May 2026 20:19:22 -0700 (PDT) Received: from rdf-gcp2.us-central1-b.c.storage-xlrait-66065.internal (163.80.112.136.bc.googleusercontent.com. [136.112.80.163]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e695bb393bsm556788a34.7.2026.05.28.20.19.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2026 20:19:22 -0700 (PDT) From: Russ Fellows To: linux-fsdevel@vger.kernel.org Cc: miklos@szeredi.hu, linux-kernel@vger.kernel.org, Russ Fellows , stable@vger.kernel.org Subject: [PATCH 1/2] fuse: fix FOPEN_PARALLEL_DIRECT_WRITES being ignored for passthrough writes Date: Fri, 29 May 2026 03:19:15 +0000 Message-ID: <20260529031918.7361-2-russ.fellows@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260529031918.7361-1-russ.fellows@gmail.com> References: <20260529031918.7361-1-russ.fellows@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" FOPEN_PARALLEL_DIRECT_WRITES has no effect on passthrough-backed FUSE files due to two independent bugs that each prevent it from working. Both must be fixed to restore parallel write concurrency. Bug 1: fuse_passthrough_write_iter() acquires the exclusive inode lock directly: inode_lock(inode); ret =3D backing_file_write_iter(...); inode_unlock(inode); This serializes all concurrent writers regardless of whether the server set FOPEN_PARALLEL_DIRECT_WRITES. The flag is checked by fuse_dio_wr_exclusive_lock(), called from fuse_dio_lock(), called from fuse_direct_write_iter() -- the non-passthrough O_DIRECT path. fuse_file_write_iter() routes passthrough opens to fuse_passthrough_write_iter() instead, bypassing the flag check entirely. Bug 2: fuse_file_io_open() in iomode.c strips FOPEN_PARALLEL_DIRECT_WRITES from any open that lacks FOPEN_DIRECT_IO: if (!(ff->open_flags & FOPEN_DIRECT_IO)) ff->open_flags &=3D ~FOPEN_PARALLEL_DIRECT_WRITES; This is correct for regular direct-IO opens where FOPEN_DIRECT_IO ensures O_DIRECT is actually in effect. It is wrong for passthrough opens: a passthrough file already bypasses the FUSE page cache by definition, so FOPEN_DIRECT_IO is redundant and should not be required to preserve the parallel-writes flag. Note: adding FOPEN_DIRECT_IO to the daemon's open flags is not a valid workaround. fuse_file_write_iter() checks FOPEN_DIRECT_IO before FOPEN_PASSTHROUGH, so setting both causes writes to be routed through fuse_direct_write_iter() (requiring a userspace round-trip) instead of fuse_passthrough_write_iter() (zero-copy kernel path). Combined effect: a daemon that opens with FOPEN_PASSTHROUGH | FOPEN_PARALLEL_DIRECT_WRITES (without FOPEN_DIRECT_IO) has the parallel flag stripped by Bug 2 before Bug 1 is even reached. Both bugs must be fixed together. Fix Bug 1: make fuse_dio_lock() and fuse_dio_unlock() non-static and call them from fuse_passthrough_write_iter(), replacing the open-coded inode_lock/inode_unlock. This reuses the existing logic that handles FOPEN_PARALLEL_DIRECT_WRITES, append writes, writes past EOF, and page-cache IO mode transitions. Fix Bug 2: skip the FOPEN_PARALLEL_DIRECT_WRITES strip when FOPEN_PASSTHROUGH is set. The flag remains stripped for non-passthrough opens without FOPEN_DIRECT_IO, preserving existing behaviour. Safety: backing_file_write_iter() calls into the backing filesystem's write_iter (e.g. xfs_file_write_iter), which acquires the backing inode's own lock independently. The FUSE inode lock and the backing inode lock are entirely separate; using inode_lock_shared on the FUSE inode does not affect the backing filesystem's concurrency control. Fixes: 4d99ff8f6b85 ("fuse: implement open/create with FOPEN_PASSTHROUGH") Cc: stable@vger.kernel.org Signed-off-by: Russ Fellows --- fs/fuse/file.c | 6 +++--- fs/fuse/fuse_i.h | 2 ++ fs/fuse/iomode.c | 8 ++++++-- fs/fuse/passthrough.c | 6 +++--- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f94f3dc082c6..602c3f18676e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1428,8 +1428,8 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *= iocb, struct iov_iter *from return false; } =20 -static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, - bool *exclusive) +void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, + bool *exclusive) { struct inode *inode =3D file_inode(iocb->ki_filp); struct fuse_inode *fi =3D get_fuse_inode(inode); @@ -1455,7 +1455,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct = iov_iter *from, } } =20 -static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) +void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) { struct inode *inode =3D file_inode(iocb->ki_filp); struct fuse_inode *fi =3D get_fuse_inode(inode); @@ -1469,7 +1469,7 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool = exclusive) } } =20 -static const struct iomap_write_ops fuse_iomap_write_ops =3D { +static const struct iomap_write_ops fuse_iomap_write_ops =3D { /* unchange= d */ .read_folio_range =3D fuse_iomap_read_folio_range, }; =20 diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 17423d4e3cfa..120de517cea0 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1541,6 +1541,8 @@ int fuse_file_io_open(struct file *file, struct inode= *inode); void fuse_file_io_release(struct fuse_file *ff, struct inode *inode); =20 /* file.c */ +void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, bool *exclus= ive); +void fuse_dio_unlock(struct kiocb *iocb, bool exclusive); struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, unsigned int open_flags, bool isdir); void fuse_file_release(struct inode *inode, struct fuse_file *ff, diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index c99e285f3..b3f51e3d1 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -214,10 +214,14 @@ int fuse_file_io_open(struct file *file, struct inode= *inode) if (fuse_inode_backing(fi) && !(ff->open_flags & FOPEN_PASSTHROUGH)) goto fail; =20 - /* - * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO. - */ - if (!(ff->open_flags & FOPEN_DIRECT_IO)) + /* + * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO, except for + * passthrough opens which bypass the page cache regardless and do not + * need FOPEN_DIRECT_IO to guarantee direct I/O semantics. + */ + if (!(ff->open_flags & FOPEN_DIRECT_IO) && + !(ff->open_flags & FOPEN_PASSTHROUGH)) ff->open_flags &=3D ~FOPEN_PARALLEL_DIRECT_WRITES; =20 /* diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index f2d08ac2459b..f83d0a27cfb9 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -54,11 +54,11 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file =3D iocb->ki_filp; - struct inode *inode =3D file_inode(file); struct fuse_file *ff =3D file->private_data; struct file *backing_file =3D fuse_file_passthrough(ff); size_t count =3D iov_iter_count(iter); ssize_t ret; + bool exclusive; struct backing_file_ctx ctx =3D { .cred =3D ff->cred, .end_write =3D fuse_passthrough_end_write, @@ -70,10 +70,10 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, if (!count) return 0; =20 - inode_lock(inode); + fuse_dio_lock(iocb, iter, &exclusive); ret =3D backing_file_write_iter(backing_file, iter, iocb, iocb->ki_flags, &ctx); - inode_unlock(inode); + fuse_dio_unlock(iocb, exclusive); =20 return ret; } --=20 2.51.0 From nobody Mon Jun 8 13:29:45 2026 Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35365367B97 for ; Fri, 29 May 2026 03:19:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024765; cv=none; b=HyTLl/RdYeXjMK6w2FlFmtisfUGEuQ9/S2NhOJD8Tyo1YV9FH+cy2NuRT2w+uAa90Lgc06tqeI5loC9o6svsoyc3V3nhFx8o7BrSBS949kjFeOI6C5cYchT5z5H0IPBYYtPCCOoFHIsvCSonPEKav/jqSOjpHpothSYr9Tajp4Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024765; c=relaxed/simple; bh=ebPWwbhZK7a9ZPtZL9IQK+7IBpikM7G4YWSLpPXvkck=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GjzJBNvJWj0kfFe+WMIATSQtIJt2tBkDNf+KTHcrv/pFvmLAm7YY+5A9URciGesbLhLLnmlP8/HibZ3kIW1bb3g26RglcsDM+bdUBblHjQd8puO5SKwZF0W0F6EmL+8BhWuLGoaZcLtAPQhr40fEQvuqZhCYhhzySNmR/NfYbyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Y2fU9XLa; arc=none smtp.client-ip=209.85.167.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Y2fU9XLa" Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-4855562f32eso2725155b6e.2 for ; Thu, 28 May 2026 20:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780024763; x=1780629563; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=56Be5L4j1U5nONJeKG6JktuUMcJW9O2wz7rUTLmLEN8=; b=Y2fU9XLaLDj2uJ75hPlJnHbrtEn0T79+K4wZnsvzrH9ke8EColmiYE+41TM1gAHG4l h35cTynplUT2xJ7dv84VFH+MgZkFOmUq1/GQkFz8ZUOJqeovV/0J0uFw9570+xtYGVYR PQgYFfKlZfVaNKMf9musLogOn8Hs5c+Gst03SgJ9lv5c4oWBZRJdapAd/RSM4HMj9yuP BxhhMApmb9TrF6iwq1m67ppazGZikrJ6DFrsIyJClUdcVJTfZ4Cxc7OTJFh+sYVBE+1j YqodcZQ6BvI7duEdM6t+YvxQxWZS2b/uhRk6SoNXnVfSdmxSDXwSE5SIc6pksbZ7l50J QlNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780024763; x=1780629563; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=56Be5L4j1U5nONJeKG6JktuUMcJW9O2wz7rUTLmLEN8=; b=qeLOmvCEe9PDlm/73/cKrjWg60WvKB7XDGFdYI/fa31eQWkfmGi7XUlbFJBfuQG0T/ Rq7l/uzlhR4im0zAJ3mLqN/OPCxJII+PrkBPfF3jLdgf5cRKQBDf3V+hgOcJZHOFf9Yq S8SfuXnirFqVC49oUAcAcKRwJ36e3rh8uhMFID6a4YCCcD1L+ldfcyjy0tPROiNe/j/a jO7OjUtyi+wnasLt/YROt+0XJC2+1DZCX3NhFYV3f97nEDynD/bnugimTVkCM8HhfOMq wJhgP1K5LnEmHePK9vxCZkuQ1Ks+FNbSx33+Ekp/jtCRBKrvdcYOtzdxeOIYMfYKu0WW jGVg== X-Forwarded-Encrypted: i=1; AFNElJ/Tvz3aiUFvYyu2k4QXDreG25OnNWS7iXqzCwNYY6dPSZFIGYN22NahODia7PST/tX0/Hk7mfQRQkW1Ne4=@vger.kernel.org X-Gm-Message-State: AOJu0YwwsqhpT/16ncyxRtkShuyzsT2rYannr05Jd+HDK1lnXyOQBL/8 1TOSXatuAzIpZceNf9ocEXRuFSalpkKQjSGS77Hp4mWrykDNQXcXxU9C X-Gm-Gg: Acq92OHzINFaYx0VIxkHc1Xxt4aN7YyJnGeMQPdA7ZAmxC8iNHMVrF2ne0MbgfE+RtH rY3FVZC1ce+MvdPzJ9xwT3wpuDMaOFlS2zemTOOmDQAktuK36OGFAyhbqx3gi8fA9Nu9zpFE8rM +W88cy7s0vxbpLqYbsSHEAQG1aZeQTSyyZba1nYpT1pCNWDBZYVaqq2MNW6Nb0ObUWbxdUoA56X L9sxMpQEeMpLzopbXqZv8IBAowQoiOn3sqf0fS4Vj3+Kc1TgaBR+GZH35B0IGTNjHWYESoXFl1d g8IHQDqL2sgz2Im6Pn0RixhY4uoh3MAJ1De7HZ57JmAk55TwOLTUv25vd+VHluCckJgi2VP6zH8 D6sQqfj0DHXendZKE/5abLSHiupof94g+qTbpOMjQZ2dujLD71IvopLjGcIC9iwQUhaQlt4NEVS KIMA56BZN3P9RdAsh9WcflZ+3HHnTQQ87SNMs5gjEfus7s4DY5AVa0CFanuNC8GjRA5a1jC0ufj 0Dql6HZXuvQdbHBEmzkADI1/Y/6mc49X7VJYuNmGnZxfzs4IUSwWCP+MA4qAKU5+Q== X-Received: by 2002:a05:6808:6d8a:b0:482:4df3:91cb with SMTP id 5614622812f47-485e6d1eebemr814280b6e.41.1780024763293; Thu, 28 May 2026 20:19:23 -0700 (PDT) Received: from rdf-gcp2.us-central1-b.c.storage-xlrait-66065.internal (163.80.112.136.bc.googleusercontent.com. [136.112.80.163]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e695bb393bsm556788a34.7.2026.05.28.20.19.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2026 20:19:22 -0700 (PDT) From: Russ Fellows To: linux-fsdevel@vger.kernel.org Cc: miklos@szeredi.hu, linux-kernel@vger.kernel.org, Russ Fellows Subject: [PATCH 2/2] fuse: reduce fi->lock contention on parallel direct I/O Date: Fri, 29 May 2026 03:19:16 +0000 Message-ID: <20260529031918.7361-3-russ.fellows@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260529031918.7361-1-russ.fellows@gmail.com> References: <20260529031918.7361-1-russ.fellows@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On the parallel passthrough write path, fi->lock was acquired three times per I/O under the original code: 1. fuse_inode_uncached_io_start() -- decrement iocachectr 2. fuse_write_update_attr() -- bump attr_version, check i_size 3. fuse_inode_uncached_io_end() -- increment iocachectr, wake waiters At 1.7M IOPS (numjobs=3D8, iodepth=3D64, 4K) this amounts to ~5.1M spinlock acquisitions/second on a single cache line. While the parallel-writes fix (patch 1/2) is the primary bottleneck, this patch eliminates the remaining fi->lock overhead on the hot path. Convert iocachectr from int to atomic_t and add lockless fast paths: fuse_inode_uncached_io_start(fb=3DNULL): use an atomic_try_cmpxchg loop to check-and-decrement without fi->lock. The lock is still taken for the first open (0=E2=86=92-1 transition) and for backing-file manipulation. fuse_inode_uncached_io_end(): use atomic_inc_return to detect the still-inflight case (counter still negative after increment) without a lock. fi->lock is only taken when the counter reaches zero, to serialize wake_up and backing-file clear with concurrent opens. fuse_write_update_attr(): skip fi->lock for the common in-EOF case. Use WRITE_ONCE for fi->attr_version (some readers already access it without fi->lock, e.g. inode.c:355 and dir.c:2069). fi->lock is only taken when pos > i_size, with a double-check inside to handle races near EOF. Parallel direct writes are gated on fuse_io_past_eof() returning false upstream, so this slow path is not taken on the hot path. All existing callsites that access iocachectr under fi->lock are updated to use the atomic API (atomic_read/inc/dec), which are no-ops with the lock held. Signed-off-by: Russ Fellows --- fs/fuse/file.c | 31 +++++++++++++++++++++++-------- fs/fuse/fuse_i.h | 9 +++++++-- fs/fuse/inode.c | 2 +- fs/fuse/iomode.c | 53 +++++++++++++++++++++++++++++++++++++++-------------- 4 files changed, 71 insertions(+), 24 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 602c3f18676e..73f870099 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1115,16 +1115,29 @@ bool fuse_write_update_attr(struct inode *inode, lo= ff_t pos, ssize_t written) struct fuse_inode *fi =3D get_fuse_inode(inode); bool ret =3D false; =20 - spin_lock(&fi->lock); - fi->attr_version =3D atomic64_inc_return(&fc->attr_version); - if (written > 0 && pos > inode->i_size) { - i_size_write(inode, pos); - ret =3D true; - } - spin_unlock(&fi->lock); - + /* + * Bump the global attr version so stale cached attrs are detected. + * WRITE_ONCE is sufficient: some readers don't hold fi->lock, and + * on x86_64 the store is naturally atomic. fi->lock is only needed + * for the i_size extension case below. + */ + WRITE_ONCE(fi->attr_version, atomic64_inc_return(&fc->attr_version)); fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); =20 + /* + * Only take fi->lock when the write may extend the file. Parallel + * direct writes are gated on fuse_io_past_eof() returning false, so + * this slow path is not taken on the hot parallel-write path. + */ + if (written > 0 && pos > READ_ONCE(inode->i_size)) { + spin_lock(&fi->lock); + if (pos > inode->i_size) { + i_size_write(inode, pos); + ret =3D true; + } + spin_unlock(&fi->lock); + } + return ret; } =20 @@ -3154,7 +3154,7 @@ void fuse_init_file_inode(struct inode *inode, unsign= ed int flags) INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); fi->writectr =3D 0; - fi->iocachectr =3D 0; + atomic_set(&fi->iocachectr, 0); init_waitqueue_head(&fi->page_waitq); init_waitqueue_head(&fi->direct_io_waitq); =20 diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 120de517cea0..67077afb3 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -153,8 +153,13 @@ struct fuse_inode { * (FUSE_NOWRITE) means more writes are blocked */ int writectr; =20 - /** Number of files/maps using page cache */ - int iocachectr; + /** + * Refcount for inode I/O mode: > 0 means cached I/O + * users, 0 is idle, < 0 means parallel uncached I/Os + * in flight. Use atomic ops; fi->lock only needed + * for the 0=E2=86=94=C2=B11 boundary transitions. + */ + atomic_t iocachectr; =20 /* Waitq for writepage completion */ wait_queue_head_t page_waitq; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 7c0403a00..81e01cb55 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -190,7 +190,7 @@ static void fuse_evict_inode(struct inode *inode) atomic64_inc(&fc->evict_ctr); } if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) { - WARN_ON(fi->iocachectr !=3D 0); + WARN_ON(atomic_read(&fi->iocachectr) !=3D 0); WARN_ON(!list_empty(&fi->write_files)); WARN_ON(!list_empty(&fi->queued_writes)); } diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index c99e285f3..611baacf9 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -17,7 +17,7 @@ */ static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi) { - return READ_ONCE(fi->iocachectr) < 0 && !fuse_inode_backing(fi); + return atomic_read(&fi->iocachectr) < 0 && !fuse_inode_backing(fi); } =20 /* @@ -60,9 +60,9 @@ int fuse_file_cached_io_open(struct inode *inode, struct = fuse_file *ff) WARN_ON(ff->iomode =3D=3D IOM_UNCACHED); if (ff->iomode =3D=3D IOM_NONE) { ff->iomode =3D IOM_CACHED; - if (fi->iocachectr =3D=3D 0) - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); - fi->iocachectr++; + if (!atomic_read(&fi->iocachectr)) + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); + atomic_inc(&fi->iocachectr); } spin_unlock(&fi->lock); return 0; @@ -72,11 +72,10 @@ static void fuse_file_cached_io_release(struct fuse_fil= e *ff, struct fuse_inode *fi) { spin_lock(&fi->lock); - WARN_ON(fi->iocachectr <=3D 0); + WARN_ON(atomic_read(&fi->iocachectr) <=3D 0); WARN_ON(ff->iomode !=3D IOM_CACHED); ff->iomode =3D IOM_NONE; - fi->iocachectr--; - if (fi->iocachectr =3D=3D 0) + if (!atomic_dec_return(&fi->iocachectr)) clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state); spin_unlock(&fi->lock); } @@ -85,23 +84,37 @@ static void fuse_file_cached_io_release(struct fuse_fil= e *ff, int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backin= g *fb) { struct fuse_backing *oldfb; - int err =3D 0; + int old, err =3D 0; + + /* + * Fast lockless path for per-I/O calls (fb=3DNULL, no backing file). + * Use a CAS loop to atomically verify no cached users are present + * and decrement the refcount in one step. + */ + if (!fb) { + old =3D atomic_read(&fi->iocachectr); + do { + if (old > 0) + return -ETXTBSY; + } while (!atomic_try_cmpxchg(&fi->iocachectr, &old, old - 1)); + return 0; + } =20 spin_lock(&fi->lock); /* deny conflicting backing files on same fuse inode */ oldfb =3D fuse_inode_backing(fi); - if (fb && oldfb && oldfb !=3D fb) { + if (oldfb && oldfb !=3D fb) { err =3D -EBUSY; goto unlock; } - if (fi->iocachectr > 0) { + if (atomic_read(&fi->iocachectr) > 0) { err =3D -ETXTBSY; goto unlock; } - fi->iocachectr--; + atomic_dec(&fi->iocachectr); =20 /* fuse inode holds a single refcount of backing file */ - if (fb && !oldfb) { + if (!oldfb) { oldfb =3D fuse_inode_backing_set(fi, fb); WARN_ON_ONCE(oldfb !=3D NULL); } else { @@ -133,10 +146,20 @@ void fuse_inode_uncached_io_end(struct fuse_inode *fi) { struct fuse_backing *oldfb =3D NULL; =20 + /* + * Fast path: other uncached I/Os still in flight -- just increment + * and return without taking fi->lock. + */ + if (atomic_inc_return(&fi->iocachectr) < 0) + return; + + /* + * This may be the last uncached I/O. Take the lock and re-check: + * a new uncached I/O may have started between the atomic_inc_return + * and the spin_lock, so only wake/clear if iocachectr is still zero. + */ spin_lock(&fi->lock); - WARN_ON(fi->iocachectr >=3D 0); - fi->iocachectr++; - if (!fi->iocachectr) { + if (!atomic_read(&fi->iocachectr)) { wake_up(&fi->direct_io_waitq); oldfb =3D fuse_inode_backing_set(fi, NULL); } --=20 2.51.0