[PATCH v2 0/2] FUSE: Implement atomic lookup + open

Dharmendra Singh posted 2 patches 4 years, 3 months ago
fs/fuse/dir.c             | 179 +++++++++++++++++++++++++++++++++-----
fs/fuse/file.c            |  30 ++++++-
fs/fuse/fuse_i.h          |  13 ++-
fs/fuse/inode.c           |   4 +-
fs/fuse/ioctl.c           |   2 +-
include/uapi/linux/fuse.h |   2 +
6 files changed, 204 insertions(+), 26 deletions(-)
[PATCH v2 0/2] FUSE: Implement atomic lookup + open
Posted by Dharmendra Singh 4 years, 3 months ago
In FUSE, as of now, uncached lookups are expensive over the wire. 
E.g additional latencies and stressing (meta data) servers from 
thousands of clients. These lookup calls possibly can be avoided
in some cases. Incoming two patches addresses this issue.

First patch handles the case where we open first time a file/dir or create
a file (O_CREAT) but do a lookup first on it. After lookup is performed
we make another call into libfuse to open the file. Now these two separate
calls into libfuse can be combined and performed as a single call into
libfuse.

Second patch handles the case when we are opening an already existing file
(positive dentry). Before this open call, we re-validate the inode and
this re-validation does a lookup on the file and verify the inode.
This separate lookup also can be avoided (for non-dir) and combined
with open call into libfuse.

Here is the link to the libfuse pull request which implements atomic open
https://github.com/libfuse/libfuse/pull/644

I am going to post performance results shortly.


Dharmendra Singh (2):
  FUSE: Implement atomic lookup + open
  FUSE: Avoid lookup in d_revalidate()

 fs/fuse/dir.c             | 179 +++++++++++++++++++++++++++++++++-----
 fs/fuse/file.c            |  30 ++++++-
 fs/fuse/fuse_i.h          |  13 ++-
 fs/fuse/inode.c           |   4 +-
 fs/fuse/ioctl.c           |   2 +-
 include/uapi/linux/fuse.h |   2 +
 6 files changed, 204 insertions(+), 26 deletions(-)

-- 
2.17.1
Re: [PATCH v2 0/2] FUSE: Atomic lookup + open performance numbers
Posted by Dharmendra Singh 4 years, 3 months ago
Subject: 'Re: [PATCH v2 0/2] FUSE: Atomic lookup + open performance numbers'

Thanks, Miklos. For measuring the performance, bonnie++ was used over passthrough_ll mount on tmpfs.
When taking numbers on vm, I could see non-deterministic behaviour in the results. Therefore core
binding was used for passthrough_ll and bonnie++, keeping them on separate cores.

Here are the google sheets having performance numbers.
https://docs.google.com/spreadsheets/d/1JRgF8DTR9xk5zz3_azmLcyy5kW3bgjjItmS8CYsAoT4/edit#gid=0
https://docs.google.com/spreadsheets/d/1JRgF8DTR9xk5zz3_azmLcyy5kW3bgjjItmS8CYsAoT4/edit#gid=1833203226

Following are the libfuse patches(commit on March 7 and March 8 in first link) which were used to test
these changes
https://github.com/aakefbs/libfuse/commits/atomic-open-and-no-flush
https://github.com/libfuse/libfuse/pull/644

Parameters used in mounting passthrough_ll:
 numactl --localalloc --physcpubind=16-23 passthrough_ll -f -osource=/tmp/source,allow_other,allow_root,
 cache=never -o max_idle_threads=1 /tmp/dest
     (Here cache=never results in direct-io on the file)

Parameters used in bonnie++:
In sheet 0B:
numactl --localalloc --physcpubind=0-7  bonnie++ -x 4 -q -s0  -d /tmp/dest/ -n 10:0:0:10 -r 0 -u 0 2>/dev/null

in sheet 1B:
numactl --localalloc --physcpubind=0-7 bonnie++ -x 4 -q -s0 -d /tmp/dest/ -n 10:1:1:10 -r 0 -u 0 2>/dev/null

Additional settings done on the testing machine:
cpupower frequency-set -g performance

Running bonnie++ gives us results for Create/s,  Read/s and Delete/s. Below table summarises the numbers
for  these three operations. Please note that for read of 0 bytes, bonnie++ does ops in order of create-open,
close and stat but no atomic open.  Therefore performance results  in the sheet 0B had overhead of extra
stat calls.  Whereas in sheet 1B, we directed bonnie++ to read 1 byte and this triggered atomic open call but
numbers for this run involve overhead for read operation itself instead of just plain open/close.

Here is the table summarising the performance numbers

Table: 0B
                                               Sequential                  |            Random
                                           Creat/s       Read/s    Del/s   |    Creat/s     Read/s      Del/s
Patched Libfuse                                -3.55%    -4.9%    -4.43%   |    -0.4%      -1.6%       -1.0%
Patched Libfuse + No-Flush                     +22.3%    +6%       +5.15%  |    +27.9%     +14.5%       +2.8%
Patched Libfuse + Patched FuseK                +22.9%    +6.1%     +5.3%   |    +28.3%     +14.5%       +2.3%
Patched Libfuse + Patched FuseK + No-Flush     +33.4%    -4.4%     -3.73%  |    +38.8%     -2.5%        -2.0%



 Table: 1B
                                                  Sequential                    |                  Random
                                           Create/s       Read/s       Del/s    |      Create/s     Read/s     Del/s
Patched Libfuse                            -0.22%        -0.35%       -0.7%     |      -0.27%        -0.78%    -2.35%
Patched Libfuse + No-Flush                 +2.5%         +2.6%        -9.6%     |      +2.5%         -8.6%     -6.26%
Patched Libfuse + Patched FuseK            +1.63%        -1.0%        -11.45%   |      +4.48%        -6.84%    -4.0%
Patched Libfuse + Patched FuseK + No-Flush  +32.43%      +26.61%      +076%     |      +33.2%       +14.7%     -0.40%

Here
No-Flush = No flush trigger from fuse kernel into libfuse

In Table 1B, we see 4th row has good improvements for both create and Read whereas Del seems to be almost not
changed. In Table 0B, 4th row we have Read perf reduced, it was found out that this was caused by some changes
in libfuse. So this was fixed and in Table 1B, same row, we can see increased numbers.

In Table 0B, 3rd row, we have good numbers because bonnie++ used 0 bytes to read  and this changed behaviour
and impacted perf whereas for the same row, Table 1B we have reduced numbers because it involved flush
calls for 1 byte from the fuse kernel into libfuse.

These changes are not for fuse kernel/users-space context switches only, but our main goal is to have improvement performance
for network file systems
   - Number network round trips
   - Reduce load on meta servers with thousands of clients

Reduced kernel/userspace context switches is 'just' a side effect.

Thanks,
Dharmendra
Re: [PATCH v2 0/2] FUSE: Implement atomic lookup + open
Posted by Dharmendra Hans 4 years, 2 months ago
On Tue, Mar 22, 2022 at 5:22 PM Dharmendra Singh <dharamhans87@gmail.com> wrote:
>
> In FUSE, as of now, uncached lookups are expensive over the wire.
> E.g additional latencies and stressing (meta data) servers from
> thousands of clients. These lookup calls possibly can be avoided
> in some cases. Incoming two patches addresses this issue.
>
> First patch handles the case where we open first time a file/dir or create
> a file (O_CREAT) but do a lookup first on it. After lookup is performed
> we make another call into libfuse to open the file. Now these two separate
> calls into libfuse can be combined and performed as a single call into
> libfuse.
>
> Second patch handles the case when we are opening an already existing file
> (positive dentry). Before this open call, we re-validate the inode and
> this re-validation does a lookup on the file and verify the inode.
> This separate lookup also can be avoided (for non-dir) and combined
> with open call into libfuse.
>
> Here is the link to the libfuse pull request which implements atomic open
> https://github.com/libfuse/libfuse/pull/644
>
> I am going to post performance results shortly.
>
>
> Dharmendra Singh (2):
>   FUSE: Implement atomic lookup + open
>   FUSE: Avoid lookup in d_revalidate()

A gentle reminder to look into the above patch set.
Re: [PATCH v2 0/2] FUSE: Implement atomic lookup + open
Posted by Dharmendra Hans 4 years, 2 months ago
On Tue, Mar 29, 2022 at 4:37 PM Dharmendra Hans <dharamhans87@gmail.com> wrote:
>
> On Tue, Mar 22, 2022 at 5:22 PM Dharmendra Singh <dharamhans87@gmail.com> wrote:
> >
> > In FUSE, as of now, uncached lookups are expensive over the wire.
> > E.g additional latencies and stressing (meta data) servers from
> > thousands of clients. These lookup calls possibly can be avoided
> > in some cases. Incoming two patches addresses this issue.
> >
> > First patch handles the case where we open first time a file/dir or create
> > a file (O_CREAT) but do a lookup first on it. After lookup is performed
> > we make another call into libfuse to open the file. Now these two separate
> > calls into libfuse can be combined and performed as a single call into
> > libfuse.
> >
> > Second patch handles the case when we are opening an already existing file
> > (positive dentry). Before this open call, we re-validate the inode and
> > this re-validation does a lookup on the file and verify the inode.
> > This separate lookup also can be avoided (for non-dir) and combined
> > with open call into libfuse.
> >
> > Here is the link to the libfuse pull request which implements atomic open
> > https://github.com/libfuse/libfuse/pull/644
> >
> > I am going to post performance results shortly.
> >
> >
> > Dharmendra Singh (2):
> >   FUSE: Implement atomic lookup + open
> >   FUSE: Avoid lookup in d_revalidate()
>
> A gentle reminder to look into the above patch set.
Sending a gentle reminder again to look into the requested patches.