From nobody Mon Jun 8 13:29:45 2026 Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80F32369D55 for ; Fri, 29 May 2026 03:12:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024339; cv=none; b=C3RV7J8HDXAz2pASccHjG70fC8zcpfcBNiNHYnvIdpBqaz8DGFHfzPl7pBYJR0eUHwbYta4b9EZCTXdgRxXL/xT5jcLohYI9GpUSuINoHnfFxWp0jcPZA7f1KmkJUVjqpWnOOXw71wWdQbZpLT9dAWDN2K/KkceIS+hSAMAdB00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024339; c=relaxed/simple; bh=40T2GLBqj4FpYx2f+Q5O8Te1Fo7OcJyPDzJ1RlcCGmM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p518A1s3FI09LItLy0sal0D6YPY7CiLcQnnByLPg/7zM4Q+dEWvkjbhD7edZMQbCxWp53ekxGU4MlHC5o7++Vo/lb8F4MdXXI3dDjmajzEDRg3CcAuBl2Hxb/RxpYfeVejnhjtUqNUKMTrTXWidfp3VbWEOlkDZocAv5JOFY6I4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MTLTV2iz; arc=none smtp.client-ip=209.85.210.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MTLTV2iz" Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-7de46b8e432so11798240a34.1 for ; Thu, 28 May 2026 20:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780024337; x=1780629137; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o9Db9daSum/vcEWTe0jIogWuT23gsSW5PUBP5KH0NAU=; b=MTLTV2izSJCm7sFmcnidLgrxTdWt4czGqAxxgb6Elo19V8WnRULnbMiQ/qM/s9/Tuo eMVa2vL9ar8egy/jjxqdnDBkmYy9O5IhMPZbP1FhhF7mRXdw1nYbgMFy16SF4osqIqZU neD0kidair8iaqPc7ZXJZz4miHkZGhbhVubt2G4dMp+/YtA8xvovFuw2+ZbvEMaaKnGX EszZnXelm028tYJRquVQ17T8RmQBsbh9/Ku3oWpBGJIAm6ECxM8vi9DUHInBvKddzmZF 5v9OZRzjBZldntdK9RtuVaieEkjXEyCu6iUKwJ90bzPqKSiAjM8xGS1WTiTQ+DYomRdx ocRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780024337; x=1780629137; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=o9Db9daSum/vcEWTe0jIogWuT23gsSW5PUBP5KH0NAU=; b=Cqi81zVoz7Ws5nFHfrFZBQBH4XoqStwQzlFNLDmfsL1BS+tjue+CyEN6RrRg7K99Qc uO7ix/si1Z39/K37bGbxKC7rs9mytrA384u0I3uCA/tFD4kbbFzp+Kx8XwDASbMczqB9 6IFjOhKbvaRCbIZSC3CkZYaThhvqIid/k8RWz+IOm/QU0mSis/1FYvYkvB6V79oCVNNh 9+259kbiRoSDV+V1jTdNks7Pvvqu+xnzJBqe5rCgZMI4aOk6QM8M8OMD16NxODBY4orz ESkOwXFDhoDpc5nLsPsNJeLiNxT64uZobtKu5y7Hdj3xXKKDLL5WkJk9Hcs6ckFlp+JB 1tzA== X-Forwarded-Encrypted: i=1; AFNElJ/OXFu1nJ4MI257ZBLv088WvafEl2nxfl8Q4k+ROzremX2Jl8FOFWBapyM1tXtarxHeOiAyANxR0n9NuMg=@vger.kernel.org X-Gm-Message-State: AOJu0YwKhHTDS4EXPB8TTDbt6YQIdAeTK4mxfeXUWZ8vN4p0WLKzkdEo RZMwe6xZTIRwAtMdPZAtIcirkq+kY08QpRG+bZ17vm/eaxJ1/K9EOrvi X-Gm-Gg: Acq92OGG9G7PV92bx/FyJw4XUs64A17QOmfsH8LV5ZoCo+anemalslHHIztWMy5+N9u wsvZMW4ABMmYKLH4BdUxLHwWmyzetJm/cvRqQ3t9354s2kCKF8K0uxeAKFzbtj6tOFrSs402J8q J1WiBR4U00dkHkmpMfPRnM+HwnaJJiDOmhTP1Bse+Pst97DZhnr5ivZl5MwVmZ+3FRMr1HII8+I 2FIUU1uf9mQeBw+i8IeVVyz0JQnYphfM5VpdMDVUk8LKT4RnIfqmTNlL3sYMuRhHrV/4lLD8NO5 w+swzZ+T2ET38rOjXP1cB5qBoER/Z9ysCM7L/r+yVu+4hZjvr3qIkts+I4/BZ2wD3XfNi5Q+Tjg k/qNV+M9kMdVMyxyQrXLmQhRsAx0PBeO0QTAKpkPkPu/d+9/fH8oYo2njp9lF4EK8pbaN0XCqkY 65XLp+xo1WLh0JxhnTpB0NUejr27BNvmfaUr6BhDJntA9QqffDJd8eIJxkmu7LFpfOqLijkusrW LfLvxRCBfCW2SB8axgQESAmyaGi++9NfO4VmNnNm6Av8dZ/iOVsWemYky1f6Ci/ZA== X-Received: by 2002:a05:6830:2a8c:b0:7dc:dd58:50c3 with SMTP id 46e09a7af769-7e694f38d25mr875627a34.13.1780024337504; Thu, 28 May 2026 20:12:17 -0700 (PDT) Received: from rdf-gcp2.us-central1-b.c.storage-xlrait-66065.internal (163.80.112.136.bc.googleusercontent.com. [136.112.80.163]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e695d65e6esm548861a34.20.2026.05.28.20.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2026 20:12:16 -0700 (PDT) From: Russ Fellows To: linux-fuse@vger.kernel.org Cc: miklos@szeredi.hu, linux-kernel@vger.kernel.org, Russ Fellows , stable@vger.kernel.org Subject: [PATCH 1/2] fuse: fix FOPEN_PARALLEL_DIRECT_WRITES being ignored for passthrough writes Date: Fri, 29 May 2026 03:12:03 +0000 Message-ID: <20260529031210.7021-2-russ.fellows@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260529031210.7021-1-russ.fellows@gmail.com> References: <20260529031210.7021-1-russ.fellows@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" FOPEN_PARALLEL_DIRECT_WRITES has no effect on passthrough-backed FUSE files due to two independent bugs that each prevent it from working. Both must be fixed to restore parallel write concurrency. Bug 1: fuse_passthrough_write_iter() acquires the exclusive inode lock directly: inode_lock(inode); ret =3D backing_file_write_iter(...); inode_unlock(inode); This serializes all concurrent writers regardless of whether the server set FOPEN_PARALLEL_DIRECT_WRITES. The flag is checked by fuse_dio_wr_exclusive_lock(), called from fuse_dio_lock(), called from fuse_direct_write_iter() -- the non-passthrough O_DIRECT path. fuse_file_write_iter() routes passthrough opens to fuse_passthrough_write_iter() instead, bypassing the flag check entirely. Bug 2: fuse_file_io_open() in iomode.c strips FOPEN_PARALLEL_DIRECT_WRITES from any open that lacks FOPEN_DIRECT_IO: if (!(ff->open_flags & FOPEN_DIRECT_IO)) ff->open_flags &=3D ~FOPEN_PARALLEL_DIRECT_WRITES; This is correct for regular direct-IO opens where FOPEN_DIRECT_IO ensures O_DIRECT is actually in effect. It is wrong for passthrough opens: a passthrough file already bypasses the FUSE page cache by definition, so FOPEN_DIRECT_IO is redundant and should not be required to preserve the parallel-writes flag. Note: adding FOPEN_DIRECT_IO to the daemon's open flags is not a valid workaround. fuse_file_write_iter() checks FOPEN_DIRECT_IO before FOPEN_PASSTHROUGH, so setting both causes writes to be routed through fuse_direct_write_iter() (requiring a userspace round-trip) instead of fuse_passthrough_write_iter() (zero-copy kernel path). Combined effect: a daemon that opens with FOPEN_PASSTHROUGH | FOPEN_PARALLEL_DIRECT_WRITES (without FOPEN_DIRECT_IO) has the parallel flag stripped by Bug 2 before Bug 1 is even reached. Both bugs must be fixed together. Fix Bug 1: make fuse_dio_lock() and fuse_dio_unlock() non-static and call them from fuse_passthrough_write_iter(), replacing the open-coded inode_lock/inode_unlock. This reuses the existing logic that handles FOPEN_PARALLEL_DIRECT_WRITES, append writes, writes past EOF, and page-cache IO mode transitions. Fix Bug 2: skip the FOPEN_PARALLEL_DIRECT_WRITES strip when FOPEN_PASSTHROUGH is set. The flag remains stripped for non-passthrough opens without FOPEN_DIRECT_IO, preserving existing behaviour. Safety: backing_file_write_iter() calls into the backing filesystem's write_iter (e.g. xfs_file_write_iter), which acquires the backing inode's own lock independently. The FUSE inode lock and the backing inode lock are entirely separate; using inode_lock_shared on the FUSE inode does not affect the backing filesystem's concurrency control. Fixes: 4d99ff8f6b85 ("fuse: implement open/create with FOPEN_PASSTHROUGH") Cc: stable@vger.kernel.org Signed-off-by: Russ Fellows --- fs/fuse/file.c | 6 +++--- fs/fuse/fuse_i.h | 2 ++ fs/fuse/iomode.c | 8 ++++++-- fs/fuse/passthrough.c | 6 +++--- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f94f3dc082c6..602c3f18676e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1428,8 +1428,8 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *= iocb, struct iov_iter *from return false; } =20 -static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, - bool *exclusive) +void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, + bool *exclusive) { struct inode *inode =3D file_inode(iocb->ki_filp); struct fuse_inode *fi =3D get_fuse_inode(inode); @@ -1455,7 +1455,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct = iov_iter *from, } } =20 -static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) +void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) { struct inode *inode =3D file_inode(iocb->ki_filp); struct fuse_inode *fi =3D get_fuse_inode(inode); @@ -1469,7 +1469,7 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool = exclusive) } } =20 -static const struct iomap_write_ops fuse_iomap_write_ops =3D { +static const struct iomap_write_ops fuse_iomap_write_ops =3D { /* unchange= d */ .read_folio_range =3D fuse_iomap_read_folio_range, }; =20 diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 17423d4e3cfa..120de517cea0 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1541,6 +1541,8 @@ int fuse_file_io_open(struct file *file, struct inode= *inode); void fuse_file_io_release(struct fuse_file *ff, struct inode *inode); =20 /* file.c */ +void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from, bool *exclus= ive); +void fuse_dio_unlock(struct kiocb *iocb, bool exclusive); struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid, unsigned int open_flags, bool isdir); void fuse_file_release(struct inode *inode, struct fuse_file *ff, diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index c99e285f3..b3f51e3d1 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -214,10 +214,14 @@ int fuse_file_io_open(struct file *file, struct inode= *inode) if (fuse_inode_backing(fi) && !(ff->open_flags & FOPEN_PASSTHROUGH)) goto fail; =20 - /* - * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO. - */ - if (!(ff->open_flags & FOPEN_DIRECT_IO)) + /* + * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO, except for + * passthrough opens which bypass the page cache regardless and do not + * need FOPEN_DIRECT_IO to guarantee direct I/O semantics. + */ + if (!(ff->open_flags & FOPEN_DIRECT_IO) && + !(ff->open_flags & FOPEN_PASSTHROUGH)) ff->open_flags &=3D ~FOPEN_PARALLEL_DIRECT_WRITES; =20 /* diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index f2d08ac2459b..f83d0a27cfb9 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -54,11 +54,11 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file =3D iocb->ki_filp; - struct inode *inode =3D file_inode(file); struct fuse_file *ff =3D file->private_data; struct file *backing_file =3D fuse_file_passthrough(ff); size_t count =3D iov_iter_count(iter); ssize_t ret; + bool exclusive; struct backing_file_ctx ctx =3D { .cred =3D ff->cred, .end_write =3D fuse_passthrough_end_write, @@ -70,10 +70,10 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, if (!count) return 0; =20 - inode_lock(inode); + fuse_dio_lock(iocb, iter, &exclusive); ret =3D backing_file_write_iter(backing_file, iter, iocb, iocb->ki_flags, &ctx); - inode_unlock(inode); + fuse_dio_unlock(iocb, exclusive); =20 return ret; } --=20 2.51.0 From nobody Mon Jun 8 13:29:45 2026 Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72E8536A351 for ; Fri, 29 May 2026 03:12:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024341; cv=none; b=rX3cBS1v5B6wq3NE7iZsHZcprbOB5qWxkVyMVlw2t0/Am6LRmQW6zGFVLBW22XHkw5X5u8GLyTrICdD1s43A4Nn13BoQ6qlj9ZpIr955aJo+S8R8F5nqwQMhcbt5uOJXbxXfqjEXyDntFDgKEUnJk2t9ZVJEeGzXgdda/fgLOKk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780024341; c=relaxed/simple; bh=ebPWwbhZK7a9ZPtZL9IQK+7IBpikM7G4YWSLpPXvkck=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Lm8AxDsG0595btC+AVnPZowf1TjsnJSQMVIEGiKQcT1G9gBDiGVX1nxwD1tBSOtCBvQ1j4A6VMIWOirs1WO97G9YhIwog9pWXefCjyyU2qtHrypbcwp4NCuqEYjTerp+q5t9PPHWvWfsVZuYd8y+SEexzHUjH8PkiBdhyD5zbcc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PBvq6lp3; arc=none smtp.client-ip=209.85.167.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PBvq6lp3" Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-479d593a0c3so11015847b6e.0 for ; Thu, 28 May 2026 20:12:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780024339; x=1780629139; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=56Be5L4j1U5nONJeKG6JktuUMcJW9O2wz7rUTLmLEN8=; b=PBvq6lp3sAYE+wGSdBpOy7I/RnRsl8PqjdhBVy7V94EEDFNLwH8fpGAiGOcY2QfQz0 ijmocg5VIkrtmnvfPeqgPNn5PvHLPKgq15TsriCJTSLF+E5efH4q/Px4/gzIanoJk1hg ztHL7hn10kdR+D/hqJz1m/4sre4vjhNxzZYi2cl5sw5xKgmOb04Qsn0Y0UMsYcFIvVRd MKYW0e4+N9Dy9pxNvpQQTenkHYlrQZ2JRwR6twR9QoZRiCSW5Ww/j5Xc1EIznNPu5q5B SM0gPArvwVhaM2Au1X8J/1jVK2mmFqwzflVLl0oPY7Mwkr0Z/5C7WvhBlFvdxAChxXnM klFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780024339; x=1780629139; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=56Be5L4j1U5nONJeKG6JktuUMcJW9O2wz7rUTLmLEN8=; b=XsMN2adL969IxtRcxPlDQPmvj7Pukjl8spmPbPayQ/sGe56q/gw7jhTQt3nzt21YgE Omp3KMNedB9dvDtT7CX1Q1xBX8vUFePcECV/auTo7f0jIuZuRgkAi25U2nqyR0O9dC6k lRMWyTLPfJ3SycYyMm7kXxSUhr63J1V5T2jHr594ANNzywUPD83HpsKT1LaQS1W6/Gwz UeIl0M29nLvwF7jupoxro6PvR8d7B82Y4ryQ6rsxFaAGZaSZVdvTr4nNL6F8jqWJRLbj XT0oHdzAeY66h+yge9O1w5UeXokkGsNXPFENo9B2FfmroZJl2J2mOkCYT4Im6i12kKUs OTqQ== X-Forwarded-Encrypted: i=1; AFNElJ8aK/9T7aVeC4P1uPuJGbApi5bClnUlUO9Ef4qpncjPPHl9Yb4FOImq9drN0ZTUwO8eZC7dsnjnKuzeWcQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzhGaktOtXMQnlaXoQe0MoJ1KtvG5GlNS55zex9P8fyiQU94oYR gHgFrM4ovUuexWHVifRfJFnvhNay+HTBTf83mOQegbzaUR9rsrC1YD1b X-Gm-Gg: Acq92OE0JiVn+SKvX42bJboHctmnpbjvdH6Be/dQYbEHfCxQ7DH9UiOmABgvo2hPO8+ CWT1K4dCincaBs33Y3BeMY4RmfZZrzq7bin1Bh1kuUCxg08dw17QTMeX4Xo0pSGgxt2zd7TH+d0 B53HNZ8eXOeMS36QVNLFzzae17KSueLfLvkZ99RFmkUKdVGKecXIbpgthCJVM/V2Bh6Xp1PMxSq eCtYc1zNECsfGsGaiUtM98TOprkzBLz1ihWGdbQSMG+c1tM/w7ZA6IjPeU4q1aVZx3mx7HYcbHB S2aa3EKnx2pi7SprpdqMClug9EozOUusJg4uzp41qC/LcVV8WwR/KRnk82t492sprETonJgOxSl NhenzjHRizZqwn4RWJGTfHNCsLHtn1BFpoj+/ta7KPfa0xGwJKGdVlGCJXF5lcQk7RA+rBiQrep m4acpDLXwfOTcEcwah/jp4N0X4ATNspPwnlb3LcLlHfIa9zuGlzK9YggQQf9wkk1yLo14iwF2Xo zDoc2pdgLXpvBnOg0V/VdOa6vVgrkxAVq7Uw4aNcW35NDhd0E/ltULG+u+QpsrWiA== X-Received: by 2002:a05:6808:158e:b0:482:aa9a:e576 with SMTP id 5614622812f47-485e74639ddmr644098b6e.32.1780024339420; Thu, 28 May 2026 20:12:19 -0700 (PDT) Received: from rdf-gcp2.us-central1-b.c.storage-xlrait-66065.internal (163.80.112.136.bc.googleusercontent.com. [136.112.80.163]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e695d65e6esm548861a34.20.2026.05.28.20.12.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2026 20:12:18 -0700 (PDT) From: Russ Fellows To: linux-fuse@vger.kernel.org Cc: miklos@szeredi.hu, linux-kernel@vger.kernel.org, Russ Fellows Subject: [PATCH 2/2] fuse: reduce fi->lock contention on parallel direct I/O Date: Fri, 29 May 2026 03:12:04 +0000 Message-ID: <20260529031210.7021-3-russ.fellows@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260529031210.7021-1-russ.fellows@gmail.com> References: <20260529031210.7021-1-russ.fellows@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On the parallel passthrough write path, fi->lock was acquired three times per I/O under the original code: 1. fuse_inode_uncached_io_start() -- decrement iocachectr 2. fuse_write_update_attr() -- bump attr_version, check i_size 3. fuse_inode_uncached_io_end() -- increment iocachectr, wake waiters At 1.7M IOPS (numjobs=3D8, iodepth=3D64, 4K) this amounts to ~5.1M spinlock acquisitions/second on a single cache line. While the parallel-writes fix (patch 1/2) is the primary bottleneck, this patch eliminates the remaining fi->lock overhead on the hot path. Convert iocachectr from int to atomic_t and add lockless fast paths: fuse_inode_uncached_io_start(fb=3DNULL): use an atomic_try_cmpxchg loop to check-and-decrement without fi->lock. The lock is still taken for the first open (0=E2=86=92-1 transition) and for backing-file manipulation. fuse_inode_uncached_io_end(): use atomic_inc_return to detect the still-inflight case (counter still negative after increment) without a lock. fi->lock is only taken when the counter reaches zero, to serialize wake_up and backing-file clear with concurrent opens. fuse_write_update_attr(): skip fi->lock for the common in-EOF case. Use WRITE_ONCE for fi->attr_version (some readers already access it without fi->lock, e.g. inode.c:355 and dir.c:2069). fi->lock is only taken when pos > i_size, with a double-check inside to handle races near EOF. Parallel direct writes are gated on fuse_io_past_eof() returning false upstream, so this slow path is not taken on the hot path. All existing callsites that access iocachectr under fi->lock are updated to use the atomic API (atomic_read/inc/dec), which are no-ops with the lock held. Signed-off-by: Russ Fellows --- fs/fuse/file.c | 31 +++++++++++++++++++++++-------- fs/fuse/fuse_i.h | 9 +++++++-- fs/fuse/inode.c | 2 +- fs/fuse/iomode.c | 53 +++++++++++++++++++++++++++++++++++++++-------------- 4 files changed, 71 insertions(+), 24 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 602c3f18676e..73f870099 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1115,16 +1115,29 @@ bool fuse_write_update_attr(struct inode *inode, lo= ff_t pos, ssize_t written) struct fuse_inode *fi =3D get_fuse_inode(inode); bool ret =3D false; =20 - spin_lock(&fi->lock); - fi->attr_version =3D atomic64_inc_return(&fc->attr_version); - if (written > 0 && pos > inode->i_size) { - i_size_write(inode, pos); - ret =3D true; - } - spin_unlock(&fi->lock); - + /* + * Bump the global attr version so stale cached attrs are detected. + * WRITE_ONCE is sufficient: some readers don't hold fi->lock, and + * on x86_64 the store is naturally atomic. fi->lock is only needed + * for the i_size extension case below. + */ + WRITE_ONCE(fi->attr_version, atomic64_inc_return(&fc->attr_version)); fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); =20 + /* + * Only take fi->lock when the write may extend the file. Parallel + * direct writes are gated on fuse_io_past_eof() returning false, so + * this slow path is not taken on the hot parallel-write path. + */ + if (written > 0 && pos > READ_ONCE(inode->i_size)) { + spin_lock(&fi->lock); + if (pos > inode->i_size) { + i_size_write(inode, pos); + ret =3D true; + } + spin_unlock(&fi->lock); + } + return ret; } =20 @@ -3154,7 +3154,7 @@ void fuse_init_file_inode(struct inode *inode, unsign= ed int flags) INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); fi->writectr =3D 0; - fi->iocachectr =3D 0; + atomic_set(&fi->iocachectr, 0); init_waitqueue_head(&fi->page_waitq); init_waitqueue_head(&fi->direct_io_waitq); =20 diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 120de517cea0..67077afb3 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -153,8 +153,13 @@ struct fuse_inode { * (FUSE_NOWRITE) means more writes are blocked */ int writectr; =20 - /** Number of files/maps using page cache */ - int iocachectr; + /** + * Refcount for inode I/O mode: > 0 means cached I/O + * users, 0 is idle, < 0 means parallel uncached I/Os + * in flight. Use atomic ops; fi->lock only needed + * for the 0=E2=86=94=C2=B11 boundary transitions. + */ + atomic_t iocachectr; =20 /* Waitq for writepage completion */ wait_queue_head_t page_waitq; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 7c0403a00..81e01cb55 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -190,7 +190,7 @@ static void fuse_evict_inode(struct inode *inode) atomic64_inc(&fc->evict_ctr); } if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) { - WARN_ON(fi->iocachectr !=3D 0); + WARN_ON(atomic_read(&fi->iocachectr) !=3D 0); WARN_ON(!list_empty(&fi->write_files)); WARN_ON(!list_empty(&fi->queued_writes)); } diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c index c99e285f3..611baacf9 100644 --- a/fs/fuse/iomode.c +++ b/fs/fuse/iomode.c @@ -17,7 +17,7 @@ */ static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi) { - return READ_ONCE(fi->iocachectr) < 0 && !fuse_inode_backing(fi); + return atomic_read(&fi->iocachectr) < 0 && !fuse_inode_backing(fi); } =20 /* @@ -60,9 +60,9 @@ int fuse_file_cached_io_open(struct inode *inode, struct = fuse_file *ff) WARN_ON(ff->iomode =3D=3D IOM_UNCACHED); if (ff->iomode =3D=3D IOM_NONE) { ff->iomode =3D IOM_CACHED; - if (fi->iocachectr =3D=3D 0) - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); - fi->iocachectr++; + if (!atomic_read(&fi->iocachectr)) + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); + atomic_inc(&fi->iocachectr); } spin_unlock(&fi->lock); return 0; @@ -72,11 +72,10 @@ static void fuse_file_cached_io_release(struct fuse_fil= e *ff, struct fuse_inode *fi) { spin_lock(&fi->lock); - WARN_ON(fi->iocachectr <=3D 0); + WARN_ON(atomic_read(&fi->iocachectr) <=3D 0); WARN_ON(ff->iomode !=3D IOM_CACHED); ff->iomode =3D IOM_NONE; - fi->iocachectr--; - if (fi->iocachectr =3D=3D 0) + if (!atomic_dec_return(&fi->iocachectr)) clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state); spin_unlock(&fi->lock); } @@ -85,23 +84,37 @@ static void fuse_file_cached_io_release(struct fuse_fil= e *ff, int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backin= g *fb) { struct fuse_backing *oldfb; - int err =3D 0; + int old, err =3D 0; + + /* + * Fast lockless path for per-I/O calls (fb=3DNULL, no backing file). + * Use a CAS loop to atomically verify no cached users are present + * and decrement the refcount in one step. + */ + if (!fb) { + old =3D atomic_read(&fi->iocachectr); + do { + if (old > 0) + return -ETXTBSY; + } while (!atomic_try_cmpxchg(&fi->iocachectr, &old, old - 1)); + return 0; + } =20 spin_lock(&fi->lock); /* deny conflicting backing files on same fuse inode */ oldfb =3D fuse_inode_backing(fi); - if (fb && oldfb && oldfb !=3D fb) { + if (oldfb && oldfb !=3D fb) { err =3D -EBUSY; goto unlock; } - if (fi->iocachectr > 0) { + if (atomic_read(&fi->iocachectr) > 0) { err =3D -ETXTBSY; goto unlock; } - fi->iocachectr--; + atomic_dec(&fi->iocachectr); =20 /* fuse inode holds a single refcount of backing file */ - if (fb && !oldfb) { + if (!oldfb) { oldfb =3D fuse_inode_backing_set(fi, fb); WARN_ON_ONCE(oldfb !=3D NULL); } else { @@ -133,10 +146,20 @@ void fuse_inode_uncached_io_end(struct fuse_inode *fi) { struct fuse_backing *oldfb =3D NULL; =20 + /* + * Fast path: other uncached I/Os still in flight -- just increment + * and return without taking fi->lock. + */ + if (atomic_inc_return(&fi->iocachectr) < 0) + return; + + /* + * This may be the last uncached I/O. Take the lock and re-check: + * a new uncached I/O may have started between the atomic_inc_return + * and the spin_lock, so only wake/clear if iocachectr is still zero. + */ spin_lock(&fi->lock); - WARN_ON(fi->iocachectr >=3D 0); - fi->iocachectr++; - if (!fi->iocachectr) { + if (!atomic_read(&fi->iocachectr)) { wake_up(&fi->direct_io_waitq); oldfb =3D fuse_inode_backing_set(fi, NULL); } --=20 2.51.0