[RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator

David Howells posted 31 patches 4 months, 1 week ago
fs/netfs/Makefile             |    1 +
fs/netfs/buffer.c             |  102 ++
fs/nls/nls_base.c             |   33 +
fs/smb/client/Makefile        |    2 +-
fs/smb/client/cached_dir.c    |   41 +-
fs/smb/client/cifs_debug.c    |   15 +-
fs/smb/client/cifs_unicode.c  |   39 +
fs/smb/client/cifs_unicode.h  |    2 +
fs/smb/client/cifsencrypt.c   |   77 +-
fs/smb/client/cifsfs.c        |   31 +-
fs/smb/client/cifsglob.h      |  254 ++--
fs/smb/client/cifsproto.h     |   91 +-
fs/smb/client/cifssmb.c       |  123 +-
fs/smb/client/cifstransport.c |  251 ++++
fs/smb/client/compress.c      |    4 +-
fs/smb/client/compress.h      |    6 +-
fs/smb/client/connect.c       |  158 +-
fs/smb/client/netmisc.c       |   15 +-
fs/smb/client/ntlmssp.h       |   20 +-
fs/smb/client/reparse.c       |    2 +-
fs/smb/client/sess.c          |  271 ++--
fs/smb/client/smb1ops.c       |  101 +-
fs/smb/client/smb2file.c      |    2 +-
fs/smb/client/smb2inode.c     |    6 +-
fs/smb/client/smb2misc.c      |   80 +-
fs/smb/client/smb2ops.c       |  570 +++----
fs/smb/client/smb2pdu.c       | 2616 +++++++++++++++++----------------
fs/smb/client/smb2proto.h     |   65 +-
fs/smb/client/smb2transport.c |  229 ++-
fs/smb/client/smbdirect.c     |   88 +-
fs/smb/client/smbdirect.h     |    5 +-
fs/smb/client/trace.h         |   61 +
fs/smb/client/transport.c     | 1372 +++++++----------
fs/smb/common/smb2pdu.h       |   92 +-
fs/smb/server/connection.c    |    2 +-
fs/smb/server/oplock.c        |    4 +-
fs/smb/server/smb2misc.c      |   14 +-
fs/smb/server/smb2ops.c       |   38 +-
fs/smb/server/smb2pdu.c       |   66 +-
fs/smb/server/smb_common.c    |    2 +-
include/linux/bvec.h          |   13 +
include/linux/iov_iter.h      |   77 +-
include/linux/netfs.h         |    4 +
include/linux/nls.h           |    1 +
include/linux/uio.h           |   11 +
lib/iov_iter.c                |  406 ++++-
lib/tests/kunit_iov_iter.c    |  196 +++
47 files changed, 4177 insertions(+), 3482 deletions(-)
create mode 100644 fs/netfs/buffer.c
create mode 100644 fs/smb/client/cifstransport.c
[RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by David Howells 4 months, 1 week ago
Hi Steve et al,

[!] NOTE: These patches are NOT FULLY WRITTEN, won't necessarily compile
    all the way through and haven't been fully tested and this is intended
    as a preview of what I'm working on.  Basic SMB1 and SMB2+ work as far
    as "cifs: Rewrite base TCP transmission".  Encrypted, signed and RDMA
    may work but haven't been tested; compressed is disabled.  Assume that
    anything beyond the specified point won't work.

I'm trying to work towards the network filesystems being able to use
zerocopy facilities such as MSG_SPLICE_PAGES and, in doing so, I think the
cifs driver can be simplified quite a bit.

However, making it possible to use the zerocopy facilities is probably
going to mean moving away from storing the data to be transmitted in slab
pages as slab pages don't have refcounts and their lifetime rules are
unsuitable for splicing.

So what I'd like to do is to allocate protocol bufferage from the netmem
allocator.  The netmem allocator, if I understand it correctly, manages DMA
mapping and IOMMU wrangling in bulk, and works as a bump/page fragment
allocator - which has the potential to improve performance a bit.

The aim is to build up a list of fragments for each request using a bvecq.
These form a segmented list and can be spliced together when assembling a
compound request.  The segmented list can then be passed to sendmsg() with
MSG_SPLICE_PAGES in a single call, thereby only having a single loop (in
the TCP stack) to shovel data, not loops-over-loops.  Possibly we can
dispense with corking also, provided we can tell TCP to flush the record
boundaries.  (Note that this also simplifies smbd_send() for RDMA).

To make this easier, I want to introduce a "request descriptor", which I'm
calling "struct smb_message" and allocate it at a higher level, notably the
PDU encoding routines in cifssmb.c and smb2pdu.c and then hand that down
into the transport.  It will contain the list of fragments that form the
message.

smb_message then replaces mid_q_struct, absorbing its contents.  The
transport then doesn't allocate these, but uses the ones that it is given
and the I/O thread gets to simplify its refcounting and do less of it.  The
rule is that smb_message gets an extra ref when it is enqueued and whoever
dequeues it gets this ref and either puts it or hands it on.  The PDU
encoding routines get a ref when allocating them and keep the refs until
they complete.

smb_message is then given a next pointer to allow compounds to be trivially
assembled, with the protocol wrangling being done in the transport.  This
next pointer also allows a bunch of fixed-size arrays to be got rid of
(which were imposing weird restrictions like reducing the maximum component
count of a compound if we stole a kvec[] slot for the transform header).

To this end, I make the following significant changes.  Note that some of
the changes are a way to transit to a later stage.

 (1) Make SMB1 transport use the SMB2 transport rather than having parallel
     dispatch code.

     (a) Get rid of the separation of the message buffer into two kvecs,
     	 one for the rfc1002 header and one for the rest.  This seems to be
     	 only to simplify the signing code - but at the expense of making a
     	 bunch of other places more complex.

     (b) Make SendReceive() wrap cifs_send_recv() as does SendReceive2().

     (c) Get rid of SendReceiveBlockingLock() in favour of using
     	 SendReceive() and a couple of flags to request interruptible
     	 waiting and to indicate Windows-type unlock rather than lock
     	 cancellation.

 (2) Replace mid_q_struct with smb_message and also include credits and
     smb_rqst therein.

 (3) Rewrite the base TCP transmission to be able to use MSG_SPLICE_PAGES.

     (a) Copy all the data involved in a message into a big buffer formed
     	 of a sequence of pages attached to a bvecq.

     (b) If encrypting the message just encrypt this buffer.  Converting
     	 this to a scatterlist is much simpler (and uses less memory) than
     	 encrypting from the protocol elements.

     (c) As the pages in the bvecq are just that, they have refcounts and
     	 can be passed to MSG_SPLICE_PAGES - thereby avoiding the copy in
     	 TCP.

     (d) Compression should be a matter of vmap()'ing these pages to form
     	 the source buffer, allocating a second buffer of pages to form a
     	 dest buffer, also in a bvecq, vmapping that and then doing the
     	 compression.  The first buffer can then just be replaced by the
     	 second.

     (e) __smb_send_rqst() can then do a single sendmsg() with
     	 MSG_SPLICE_PAGES() from an ITER_BVECQ-type iterator.

     (f) smbd_send() can push the same buffer to smbd_post_send_iter() from
     	 the same iterator.

 (4) Clean up mid->callback_data.  Replace it with a waitqueue in
     smb_message (for most commands) and a cifs_io_subrequest pointer (for
     read and write).  Make request completion wait on the smb_message
     waitqueue rather than on server->response_q to avoid thundering herd
     issues.

     (Also, I note that under some circumstances, cifs just wakes up the
     first thing on server->response_q without any reference to *what* it
     is waking up).

 (5) Add some more bits to smb_message to hold the buffers in a bvecq with
     the intent of killing of the smb_rqst struct.

     (a) The PDU encoders will have to work out how much memory they need
     	 for the request protocol bits in advance and tell the smb_message
     	 allocator their requirements.  This will get the requested amount
     	 from the netmem allocator, so it needs to be correctly sized.  A
     	 pointer is then set in smb->request to the buffer.

     (b) The smb_message is given a pointer (->next) to chain to another
     	 message to be compounded after it.

     (c) smb_send_recv_messages() will be used to dispatch a synchronous
     	 request.  If the head smb_message's ->next pointer is not NULL, it
     	 will set the appropriate compound chaining stuff and insert
     	 appropriate padding.  Then it will link the bvecq structs of those
     	 messages together.

 (6) Convert PDU encoders to allocate and use smb_message and pass it down.

     (a) So far, SMB2 Negotiate Protocol, Session Setup, Logoff, Tree
     	 Connect and Tree Disconnect have been done - and though they build
     	 if SMB1 and compression are disabled, they won't work yet and so
     	 haven't been tested.

     (b) SMB2 Posix Mkdir has been attempted and will compile, but is
     	 likely to need rejigging as it's a close associate of Create.

     (c) SMB2 Create/Open is partially done and won't compile.  This gets
     	 complicated because it's used in a lot of places and also gets
     	 compounded - so anything that gets compounded with it must also be
     	 converted.

The patches can be found here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental

Thanks,
David

David Howells (31):
  iov_iter: Move ITER_DISCARD and ITER_XARRAY iteration out-of-line
  iov_iter: Add a segmented queue of bio_vec[]
  netfs: Provide facility to alloc buffer in a bvecq
  cifs, nls: Provide unicode size determination func
  cifs: Introduce an ALIGN8() macro
  cifs: Move the SMB1 transport code out of transport.c
  cifs: Rename mid_q_entry to smb_message
  cifs: Keep the CPU-endian command ID around
  cifs: Rename SMB2_xxxx_HE to SMB2_xxxx
  cifs: Make smb1's SendReceive() wrap cifs_send_recv()
  cifs: Fix SMB1 to not require separate kvec for the rfc1002 header
  cifs: Replace SendReceiveBlockingLock() with SendReceive() plus flags
  cifs: Institute message managing struct
  cifs: Split crypt_message() into encrypt and decrypt variants
  cifs: Use netfs_alloc/free_folioq_buffer()
  cifs: Rewrite base TCP transmission
  cifs: Rework smb2 decryption
  cifs: Pass smb_message structs down into the transport layer
  cifs: Clean up mid->callback_data and kill off mid->creator
  cifs: Don't need state locking in smb2_get_mid_entry()
  cifs: [DEBUG] smb_message refcounting
  cifs: Add netmem allocation functions
  cifs: Add more pieces to smb_message
  cifs: Convert SMB2 Negotiate Protocol request
  cifs: Convert SMB2 Session Setup request
  cifs: Convert SMB2 Logoff request
  cifs: Convert SMB2 Tree Connect request
  cifs: Convert SMB2 Tree Disconnect request
  cifs: Rearrange Create request subfuncs
  cifs: Convert SMB2 Posix Mkdir request
  cifs: Convert SMB2 Open request

 fs/netfs/Makefile             |    1 +
 fs/netfs/buffer.c             |  102 ++
 fs/nls/nls_base.c             |   33 +
 fs/smb/client/Makefile        |    2 +-
 fs/smb/client/cached_dir.c    |   41 +-
 fs/smb/client/cifs_debug.c    |   15 +-
 fs/smb/client/cifs_unicode.c  |   39 +
 fs/smb/client/cifs_unicode.h  |    2 +
 fs/smb/client/cifsencrypt.c   |   77 +-
 fs/smb/client/cifsfs.c        |   31 +-
 fs/smb/client/cifsglob.h      |  254 ++--
 fs/smb/client/cifsproto.h     |   91 +-
 fs/smb/client/cifssmb.c       |  123 +-
 fs/smb/client/cifstransport.c |  251 ++++
 fs/smb/client/compress.c      |    4 +-
 fs/smb/client/compress.h      |    6 +-
 fs/smb/client/connect.c       |  158 +-
 fs/smb/client/netmisc.c       |   15 +-
 fs/smb/client/ntlmssp.h       |   20 +-
 fs/smb/client/reparse.c       |    2 +-
 fs/smb/client/sess.c          |  271 ++--
 fs/smb/client/smb1ops.c       |  101 +-
 fs/smb/client/smb2file.c      |    2 +-
 fs/smb/client/smb2inode.c     |    6 +-
 fs/smb/client/smb2misc.c      |   80 +-
 fs/smb/client/smb2ops.c       |  570 +++----
 fs/smb/client/smb2pdu.c       | 2616 +++++++++++++++++----------------
 fs/smb/client/smb2proto.h     |   65 +-
 fs/smb/client/smb2transport.c |  229 ++-
 fs/smb/client/smbdirect.c     |   88 +-
 fs/smb/client/smbdirect.h     |    5 +-
 fs/smb/client/trace.h         |   61 +
 fs/smb/client/transport.c     | 1372 +++++++----------
 fs/smb/common/smb2pdu.h       |   92 +-
 fs/smb/server/connection.c    |    2 +-
 fs/smb/server/oplock.c        |    4 +-
 fs/smb/server/smb2misc.c      |   14 +-
 fs/smb/server/smb2ops.c       |   38 +-
 fs/smb/server/smb2pdu.c       |   66 +-
 fs/smb/server/smb_common.c    |    2 +-
 include/linux/bvec.h          |   13 +
 include/linux/iov_iter.h      |   77 +-
 include/linux/netfs.h         |    4 +
 include/linux/nls.h           |    1 +
 include/linux/uio.h           |   11 +
 lib/iov_iter.c                |  406 ++++-
 lib/tests/kunit_iov_iter.c    |  196 +++
 47 files changed, 4177 insertions(+), 3482 deletions(-)
 create mode 100644 fs/netfs/buffer.c
 create mode 100644 fs/smb/client/cifstransport.c
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by Stefan Metzmacher 4 months, 1 week ago
Hi David,

> The aim is to build up a list of fragments for each request using a bvecq.
> These form a segmented list and can be spliced together when assembling a
> compound request.  The segmented list can then be passed to sendmsg() with
> MSG_SPLICE_PAGES in a single call, thereby only having a single loop (in
> the TCP stack) to shovel data, not loops-over-loops.  Possibly we can
> dispense with corking also, provided we can tell TCP to flush the record
> boundaries.  (Note that this also simplifies smbd_send() for RDMA).

I didn't look at the patches in detail, but I like the simplifications
for the transport layer and that hopefully allows me to
move smbd_send() behind sendmsg() with MSG_SPLICE_PAGES in the end.

So the current situation is that we memcpy (at least) in sendmsg()
and with your patches we do a memcpy higher in the stack, but then
use MSG_SPLICE_PAGES in order to do it twice. Is that correct?

Thanks!
metze
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by David Howells 4 months, 1 week ago
Stefan Metzmacher <metze@samba.org> wrote:

> So the current situation is that we memcpy (at least) in sendmsg()
> and with your patches we do a memcpy higher in the stack, but then
> use MSG_SPLICE_PAGES in order to do it twice. Is that correct?

Not twice, no.  MSG_SPLICE_PAGES allows sendmsg() to splice the supplied pages
into the sk_buffs directly, thereby avoiding a copy in the TCP layer and
cutting out the feeder loop in cifs.

However, this is meant to be an intermediate step.  I actually want to
assemble the fragment list in a bvecq in the smb_create_request() as called by
the PDU encoders, with everything aligned for crypto so that the crypto layer
doesn't copy it also.  But cleaning up the transport first should hopefully
reduce the size of the later patches.

David
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by Stefan Metzmacher 4 months, 1 week ago
Am 07.08.25 um 08:24 schrieb David Howells:
> Stefan Metzmacher <metze@samba.org> wrote:
> 
>> So the current situation is that we memcpy (at least) in sendmsg()
>> and with your patches we do a memcpy higher in the stack, but then
>> use MSG_SPLICE_PAGES in order to do it twice. Is that correct?
> 
> Not twice, no.  MSG_SPLICE_PAGES allows sendmsg() to splice the supplied pages
> into the sk_buffs directly, thereby avoiding a copy in the TCP layer and
> cutting out the feeder loop in cifs.

Yes, and we must be careful to not touch the pages after
calling sendmsg(MSG_SPLICE_PAGES).

And unlike MSG_ZEROCOPY tcp_sendmsg_locked() has no
no struct ubuf_info *uarg when MSG_SPLICE_PAGES is used
and there's no way to know when the pages are no longer
used by the tcp stack.

Can you explain how/where we allocate the memory and where
we unreference it in the caller of sendmsg(MSG_SPLICE_PAGES).

> However, this is meant to be an intermediate step.  I actually want to
> assemble the fragment list in a bvecq in the smb_create_request() as called by
> the PDU encoders, with everything aligned for crypto so that the crypto layer
> doesn't copy it also.  But cleaning up the transport first should hopefully
> reduce the size of the later patches.

Sounds good :-)

metze
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by David Howells 4 months, 1 week ago
Stefan Metzmacher <metze@samba.org> wrote:

> >> So the current situation is that we memcpy (at least) in sendmsg()
> >> and with your patches we do a memcpy higher in the stack, but then
> >> use MSG_SPLICE_PAGES in order to do it twice. Is that correct?
> > Not twice, no.  MSG_SPLICE_PAGES allows sendmsg() to splice the supplied
> > pages
> > into the sk_buffs directly, thereby avoiding a copy in the TCP layer and
> > cutting out the feeder loop in cifs.
> 
> Yes, and we must be careful to not touch the pages after
> calling sendmsg(MSG_SPLICE_PAGES).

Until we get a response from the server, yes, but for the protocol info that
shouldn't be an issue.  And if we're going to encrypt, we'll have to do a copy
anyway for something like Write, but we can get the encryption algo to do that
for us by giving it a separate destination buffer.

> And unlike MSG_ZEROCOPY tcp_sendmsg_locked() has no
> no struct ubuf_info *uarg when MSG_SPLICE_PAGES is used
> and there's no way to know when the pages are no longer
> used by the tcp stack.

Correct (and this is something we'll need to address), but for the moment we
can rely on page refcounts.  MSG_SPLICE_PAGES takes a ref on each page - which
is why you can't use it with slab memory.  However, if we pass in
netmem-allocated memory, that works by refcounting, so that should work.

> Can you explain how/where we allocate the memory and where
> we unreference it in the caller of sendmsg(MSG_SPLICE_PAGES).

Currently, we allocate the buffer in fs/netfs/buffer.c in
netfs_alloc_bvecq_buffer().  That just bulk allocates a bunch of pages and
adds them into a bvecq.  As they're untyped pages, we can use the refcount.  I
want to allocate netmem instead, but I haven't done that yet.

We then call sendmsg(MSG_SPLICE_PAGES) and then drop our ref on the pages.
TCP will have taken its own ref which it will drop in due course when the
skbuffs are cleaned up.

David
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by Enzo Matsumiya 4 months, 1 week ago
Hi David,

On 08/06, David Howells wrote:
> (3) Rewrite the base TCP transmission to be able to use MSG_SPLICE_PAGES.
>
>     (a) Copy all the data involved in a message into a big buffer formed
>     	 of a sequence of pages attached to a bvecq.

Nice!

>     (b) If encrypting the message just encrypt this buffer.  Converting
>     	 this to a scatterlist is much simpler (and uses less memory) than
>     	 encrypting from the protocol elements.

This could've been done a long ago, but since you're on it... crypto API
supports in-place ops (src == dst), so you can save yet another allocation
here.

This is also possible from SMB2 side because the spec says if encryption
fails, the request should be failed as a whole.

>     (d) Compression should be a matter of vmap()'ing these pages to form
>     	 the source buffer, allocating a second buffer of pages to form a
>     	 dest buffer, also in a bvecq, vmapping that and then doing the
>     	 compression.  The first buffer can then just be replaced by the
>     	 second.

OTOH, compression can't be in-place because SMB2 says that if
compression fails, the original uncompressed request must be sent (i.e.
src must be left untouched until smb_compress() finishes).

I don't know how relevant these comments are for you and your patches,
but HTH.

Directly related to the patches: it would be great if you could handle
commented out code, either by completely removing them, or by providing
some in-code fallback mechanism (I'll reply to the patches).

Other than that, +1 for the cleanup/refactoring; I really like the new
smb_message/transport infrastructure!


Cheers,

Enzo
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by David Howells 4 months, 1 week ago
Hi Enzo,

> >     (d) Compression should be a matter of vmap()'ing these pages to form
> >     	 the source buffer, allocating a second buffer of pages to form a
> >     	 dest buffer, also in a bvecq, vmapping that and then doing the
> >     	 compression.  The first buffer can then just be replaced by the
> >     	 second.
> 
> OTOH, compression can't be in-place because SMB2 says that if
> compression fails, the original uncompressed request must be sent (i.e.
> src must be left untouched until smb_compress() finishes).

I've got a change which should achieve this, but it seems I can't test it.
None of ksmbd, samba and azure seem to support it:-/

David
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by Enzo Matsumiya 4 months, 1 week ago
On 08/08, David Howells wrote:
>Hi Enzo,
>
>> >     (d) Compression should be a matter of vmap()'ing these pages to form
>> >     	 the source buffer, allocating a second buffer of pages to form a
>> >     	 dest buffer, also in a bvecq, vmapping that and then doing the
>> >     	 compression.  The first buffer can then just be replaced by the
>> >     	 second.
>>
>> OTOH, compression can't be in-place because SMB2 says that if
>> compression fails, the original uncompressed request must be sent (i.e.
>> src must be left untouched until smb_compress() finishes).
>
>I've got a change which should achieve this, but it seems I can't test it.
>None of ksmbd, samba and azure seem to support it:-/

Yes, Windows 11 or Windows Server 2022+ only.

Compression for ksmbd and samba have been on my TODO list for too long,
I should get back to it :/

Anyway, if you want me to test, just send me the patches.
I have your linux-fs remote as well, if that's easier.


Cheers,

Enzo
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by David Howells 4 months ago
Hi Enzo,

I now have encryption, compression and encryption+compression all working :-)

I've pushed my patches here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental

It should work up to "cifs: Don't use corking".

Btw, is is_compressible() actually worth doing?  It seems to copy a lot of
data (up to 4M) to an extra buffer and then do various analyses on it,
including doing a sort.

I need to extract a fix for collect_sample(), which I can do tomorrow, but it
should look something like:

/*
 * Collect some 2K samples with 2K gaps between.
 */
static int collect_sample(const struct iov_iter *source, ssize_t max, u8 *sample)
{
	struct iov_iter iter = *source;
	size_t s = 0;

	while (iov_iter_count(&iter) >= SZ_2K) {
		size_t part = umin(umin(iov_iter_count(&iter), SZ_2K), max);
		size_t n;

		n = copy_from_iter(sample + s, part, &iter);
		if (n != part)
			return -EFAULT;

		s += n;
		max -= n;

		if (iov_iter_count(&iter) < PAGE_SIZE - SZ_2K)
			break;

		iov_iter_advance(&iter, SZ_2K);
	}

	return s;
}

What's currently upstream won't work and may crash because it assumes that
ITER_XARRAY is in use - which should now never be true.

Also, there's a bug in wireshark's LZ77 decoder.  See attached patch.

David

diff --git a/epan/tvbuff_lz77.c b/epan/tvbuff_lz77.c
index a609912636..13ab5c50ed 100644
--- a/epan/tvbuff_lz77.c
+++ b/epan/tvbuff_lz77.c
@@ -68,7 +68,7 @@ static bool do_uncompress(tvbuff_t *tvb, int offset, int in_size,
 						in_off += 2;
 						if (match_len == 0) {
 							/* This case isn't documented */
-							match_len = tvb_get_letohs(tvb, offset+in_off);
+							match_len = tvb_get_letohl(tvb, offset+in_off);
 							in_off += 4;
 						}
 						if (match_len < 15+7)
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by Enzo Matsumiya 4 months ago
On 08/11, David Howells wrote:
>Hi Enzo,
>
>I now have encryption, compression and encryption+compression all working :-)
>
>I've pushed my patches here:
>
>	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental
>
>It should work up to "cifs: Don't use corking".

Great! I'll try it out later.

>Btw, is is_compressible() actually worth doing?  It seems to copy a lot of
>data (up to 4M) to an extra buffer and then do various analyses on it,
>including doing a sort.

Compression, as a whole, is actually only worth doing if one is paying
more for network traffic than computing.  is_compressible() tries to
balance that to avoid a "compress/fail/send original" cycle, as it takes
0-4ms on a 4M payload (on my machine) whereas, without it, a failing
cycle can take up to 40ms.

>I need to extract a fix for collect_sample(), which I can do tomorrow, but it
>should look something like:
>
>/*
> * Collect some 2K samples with 2K gaps between.
> */
>static int collect_sample(const struct iov_iter *source, ssize_t max, u8 *sample)
>{
>	struct iov_iter iter = *source;
>	size_t s = 0;
>
>	while (iov_iter_count(&iter) >= SZ_2K) {
>		size_t part = umin(umin(iov_iter_count(&iter), SZ_2K), max);
>		size_t n;
>
>		n = copy_from_iter(sample + s, part, &iter);
>		if (n != part)
>			return -EFAULT;
>
>		s += n;
>		max -= n;
>
>		if (iov_iter_count(&iter) < PAGE_SIZE - SZ_2K)
>			break;
>
>		iov_iter_advance(&iter, SZ_2K);
>	}
>
>	return s;
>}
>
>What's currently upstream won't work and may crash because it assumes that
>ITER_XARRAY is in use - which should now never be true.

Yes, compression was merged when that was the only case.

>Also, there's a bug in wireshark's LZ77 decoder.  See attached patch.

Good catch :)
There are several, actually... if you vary the compression parameters
defined (min len, min/max dist, hash log) within acceptable limits,
you'll notice that, even though wireshark might show some as malformed
packets, Windows is able to decode them just fine.

I really need to reserve some time to work on this again :(


Cheers,

Enzo
Re: [RFC PATCH 00/31] netfs: [WIP] Allow the use of MSG_SPLICE_PAGES and use netmem allocator
Posted by David Howells 4 months, 1 week ago
Enzo Matsumiya <ematsumiya@suse.de> wrote:

> Anyway, if you want me to test, just send me the patches.
> I have your linux-fs remote as well, if that's easier.

If you look at:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental

You can see a patch with the subject "cifs: [!] FIX transport compression".
Grab that and everything up to it.

I'm pretty certain it won't work, but I can't test it.  Well, maybe I can
force it on and look at the packet trace if wireshark can handle it.

Some things to note:

 (1) In smb_compress(), netfs_alloc_bvecq_buffer() is used to allocate the
     destination buffer and attach it to a bvecq, which it also allocates.  An
     iterator can be set on this to define part of the buffer to operate on.

 (2) vmap_bvecq() is used to map the source and the destination buffers.  It
     extracts the pages and then calls vmap() on them.

 (3) Space for the compression header is allocated by smb_send_rqst() in the
     first bvecq slot along with the rfc1002 header and transform header (if
     sealing).  A pointer is passed down to smb_compress().

 (4) It attempts to adjust the values such that the compression header is
     included in the encrypted section if also sealing.  However, it might be
     better to have smb_compress() place it in the output buffer.

 (5) If compression is successful, smb_compress() switches the original bvecq
     and the output bvecq and moves the header segment from the original to
     the output.

 (6) If the compression algo returns -EMSGSIZE, then the compression header is
     excluded from the header segment.

David