[RFC PATCH 00/36] cifs: [WIP] Overhaul message handling and improve nework transport

David Howells posted 36 patches 5 days, 14 hours ago
fs/netfs/Makefile             |    4 +
fs/netfs/rxqueue.c            |  532 ++++++
fs/netfs/tcp_splice.c         |  269 +++
fs/nls/nls_base.c             |   33 +
fs/smb/client/cached_dir.c    |   41 +-
fs/smb/client/cifs_debug.c    |   53 +-
fs/smb/client/cifs_debug.h    |    3 +-
fs/smb/client/cifs_unicode.c  |   39 +
fs/smb/client/cifs_unicode.h  |    2 +
fs/smb/client/cifsencrypt.c   |    4 +-
fs/smb/client/cifsfs.c        |   30 +-
fs/smb/client/cifsglob.h      |  297 ++--
fs/smb/client/cifsproto.h     |  168 +-
fs/smb/client/cifssmb.c       |  345 ++--
fs/smb/client/compress.c      |  155 +-
fs/smb/client/compress.h      |   14 +-
fs/smb/client/connect.c       |  707 ++++----
fs/smb/client/ntlmssp.h       |    8 +-
fs/smb/client/reparse.c       |    2 +-
fs/smb/client/sess.c          |  306 ++--
fs/smb/client/smb1debug.c     |   56 +-
fs/smb/client/smb1encrypt.c   |  132 +-
fs/smb/client/smb1maperror.c  |   15 +-
fs/smb/client/smb1misc.c      |   22 +-
fs/smb/client/smb1ops.c       |   96 +-
fs/smb/client/smb1pdu.h       |   62 +-
fs/smb/client/smb1proto.h     |   58 +-
fs/smb/client/smb1session.c   |    4 +-
fs/smb/client/smb1transport.c | 1154 +++++++++----
fs/smb/client/smb2file.c      |    3 +-
fs/smb/client/smb2inode.c     |    8 +-
fs/smb/client/smb2maperror.c  |    3 +-
fs/smb/client/smb2misc.c      |  423 +++--
fs/smb/client/smb2ops.c       | 1190 ++------------
fs/smb/client/smb2pdu.c       | 2889 +++++++++++++++++----------------
fs/smb/client/smb2proto.h     |   83 +-
fs/smb/client/smb2transport.c | 1172 +++++++++++--
fs/smb/client/smbdirect.c     |  105 +-
fs/smb/client/smbdirect.h     |    5 +-
fs/smb/client/trace.h         |  180 ++
fs/smb/client/transport.c     | 1254 ++++++++------
fs/smb/common/smb2pdu.h       |   55 +-
fs/smb/server/smb2pdu.c       |   22 +-
include/linux/netfs.h         |   37 +
include/linux/nls.h           |    1 +
include/trace/events/netfs.h  |   28 +
net/core/skbuff.c             |  119 ++
47 files changed, 7135 insertions(+), 5053 deletions(-)
create mode 100644 fs/netfs/rxqueue.c
create mode 100644 fs/netfs/tcp_splice.c
[RFC PATCH 00/36] cifs: [WIP] Overhaul message handling and improve nework transport
Posted by David Howells 5 days, 14 hours ago
Hi Steve et al,

[!] NOTE: These patches are NOT FULLY WRITTEN, won't necessarily compile
    all the way through and haven't been fully tested and this is intended
    as a preview of what I'm working on.  Basic SMB2+ has worked as far as
    "cifs: Convert SMB2 Write request".  Encrypted and signed messages
    should work but haven't been tested; compressed is disabled.  Assume
    that anything beyond the specified point won't work.

    SMB1 should work up to somewhere around "cifs: Rewrite base TCP
    transmission", but somewhere beyond that it won't compile.  I need to
    go back and fix this up.

    RDMA almost certainly won't work.  Ideally, I would like to make RDMA
    message passing (rather than direct data transport) supply the received
    fragments in a bvecq to the message parsing routines.

The aim of this patchset is to build up a list of fragments for each
request using a bvecq.  These form a segmented list and can be spliced
together when assembling a compound request.  The segmented list can then
be passed to sendmsg() with MSG_SPLICE_PAGES in a single call, thereby only
having a single loop (in the TCP stack) to shovel data, rather than loops
within loops.  Possibly we can dispense with TCP corking also, provided we
can tell the socket to flush the record boundaries.  (Note that this also
simplifies smbd_send() for RDMA).

To make this easier, I want to introduce a "request descriptor", which I'm
calling "struct smb_message" and allocate it at a higher level, notably the
PDU marshalling routines in cifssmb.c and smb2pdu.c and then hand that down
into the transport.  It will contain the list of fragments that form the
message.

mid_q_struct is then 'absorbed' into smb_message.  The transport then
doesn't allocate these, but uses the ones that it is given and the I/O
thread gets to simplify its refcounting and do less of it.  The rule is
that smb_message gets an extra ref when it is enqueued and whoever dequeues
it gets this ref and either puts it or hands it on.  The PDU encoding
routines get a ref when allocating them and keep the refs until they
complete.

smb_message is then given a next pointer to allow compounds to be trivially
assembled, with the protocol wrangling being done in the transport.  This
next pointer also allows a bunch of fixed-size arrays to be got rid of
(which were imposing weird restrictions like reducing the maximum component
count of a compound if we stole a kvec[] slot for the transform header).

Request buffers will be allocated from a per-connection page frag allocator
rather than from kmalloc(), thereby allowing them to be passed to
MSG_SPLICE_PAGES.

To this end, I make the following significant changes.  Note that some of
the changes are a way to transit to a later stage.

 (0) Make SMB1 transport use the SMB2 transport rather than having parallel
     dispatch code (now upstream).

 (1) Make skb_splice_from_iter() special case ITER_BVECQ-type iterators and
     walk the bvecq directly rather than calling iov_iter_extract_pages().
     This allows access to the information on the bvecq about whether a
     memory fragment is held by a page ref or by a pin - which is something
     sk_buff needs to take account of at some point.

 (2) Provide netfslib facilities to splice the receive buffers directly out
     of a TCP socket into a bvecq, allowing the socket lock to be dropped
     earlier and reducing the amount of time sendmsg is held up.

 (3) Replace mid_q_struct with smb_message and also include credits and
     smb_rqst therein.

 (4) Rewrite cifs TCP transmission to be able to use MSG_SPLICE_PAGES:

     (a) Copy all the data involved in a message into a big buffer formed
     	 of a sequence of pages attached to a bvecq.

     (b) If encrypting the message just encrypt this buffer.  Converting
     	 this to a scatterlist is much simpler (and uses less memory) than
     	 encrypting from the protocol elements.

     (c) As the pages in the bvecq are just that, they have refcounts and
     	 can be passed to MSG_SPLICE_PAGES - thereby avoiding the copy in
     	 TCP.

     (d) Compression should be a matter of vmap()'ing these pages to form
     	 the source buffer, allocating a second buffer of pages to form a
     	 dest buffer, also in a bvecq, vmapping that and then doing the
     	 compression.  The first buffer can then just be replaced by the
     	 second.

     (e) __smb_send_rqst() can then do a single sendmsg() with
     	 MSG_SPLICE_PAGES() from an ITER_BVECQ-type iterator.

     (f) smbd_send() can push the same buffer to smbd_post_send_iter() from
     	 the same iterator.

 (5) Rewrite cifs TCP reception to use the facility to splice the receive
     queue out of the socket and into a bvecq rather than using recvmsg()
     to read it.  The bvecq is then processed through helper functions to
     parse incoming messages for both SMB1 and SMB2/3.  This allows reading
     to be deferred to avoid blocking the I/O thread.

 (6) Clean up mid->callback_data.  Replace it with a waitqueue in
     smb_message (for most commands) and a cifs_io_subrequest pointer (for
     read and write).  Make request completion wait on the smb_message
     waitqueue rather than on server->response_q to avoid thundering herd
     issues.

     (Also, I note that under some circumstances, cifs just wakes up the
     first thing on server->response_q without any reference to *what* it
     is waking up).

 (7) Add some more bits to smb_message to hold the buffers in a bvecq with
     the intent of killing of the smb_rqst struct.

     (a) The PDU encoders will have to work out how much memory they need
     	 for the request protocol bits in advance and tell the smb_message
     	 allocator their requirements.  This will get the requested amount
     	 from the netmem allocator, so it needs to be correctly sized.  A
     	 pointer is then set in smb->request to the buffer.

     (b) The smb_message is given a pointer (->next) to chain to another
     	 message to be compounded after it.

     (c) smb_send_recv_messages() will be used to dispatch a synchronous
     	 request.  If the head smb_message's ->next pointer is not NULL, it
     	 will set the appropriate compound chaining stuff and insert
     	 appropriate padding.  Then it will link the bvecq structs of those
     	 messages together.

 (8) Convert PDU encoders to allocate and use smb_message and pass it down.

     (a) So far, SMB2 Negotiate Protocol, Session Setup, Logoff, Tree
     	 Connect, Tree Disconnect, Read and Write have been done - and
     	 though they build if SMB1 and compression are disabled, they won't
     	 work yet and so haven't been tested.

     (b) SMB2 Posix Mkdir has been attempted and will compile, but is
     	 likely to need rejigging as it's a close associate of Create.

     (c) SMB2 Create/Open is partially done and won't compile.  This gets
     	 complicated because it's used in a lot of places and also gets
     	 compounded - so anything that gets compounded with it must also be
     	 converted.

The patches can be found here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental

Thanks,
David

David Howells (36):
  net: Perform special handling for a splice from a bvecq
  netfs: Add a facility to splice TCP receive buffers into a bvecq
  netfs: Add some TCP receive queue helpers
  cifs, nls: Provide unicode size determination func
  cifs: Introduce an ALIGN8() macro
  cifs: Rename mid_q_entry to smb_message
  cifs: Add "Has dynamic part" flag form SMB2/3 StructureSize LSB
  cifs: Add an enum to hold a trace value for the command/subcommand
  cifs: Institute message managing struct
  cifs: Split crypt_message() into encrypt and decrypt variants
  cifs: Add new AEAD alloc and setup routines that draw from an iterator
  cifs: [WIP] Rewrite base Rx to put data off the socket into a bvecq
  cifs: Remove validate_t2()
  cifs: Remove cifs_io_subrequest::got_bytes
  cifs: Pass smb_message to cifs_verify_signature()
  cifs: Rewrite base TCP transmission
  cifs: Don't use corking
  cifs: Use page frag allocator for Tx buffers
  cifs: Try to better handle the "Dynamic" flag in StructureSize2 in
    SMB2/3
  cifs: Pass smb_message structs down into the transport layer
  cifs: Add a tracepoint to trace the smb_message refcount
  cifs: Trace smb1/2_copy_to_prepped_buffers()
  cifs: Clean up mid->callback_data and kill off mid->creator
  cifs: Add netmem allocation functions
  cifs: Add more pieces to smb_message
  cifs: Convert SMB2 Negotiate Protocol request
  cifs: Convert SMB2 Session Setup request
  cifs: Convert SMB2 Logoff request
  cifs: Convert SMB2 Tree Connect request
  cifs: Convert SMB2 Tree Disconnect request
  cifs: Convert SMB2 Read request
  cifs: Convert SMB2 Write request
  cifs: [WIP] Don't copy new-style smb_messages to a set of pages
  cifs: [WIP] Rearrange Create request subfuncs
  cifs: [WIP] Convert SMB2 Posix Mkdir request
  cifs: [WIP] Convert SMB2 Open request

 fs/netfs/Makefile             |    4 +
 fs/netfs/rxqueue.c            |  532 ++++++
 fs/netfs/tcp_splice.c         |  269 +++
 fs/nls/nls_base.c             |   33 +
 fs/smb/client/cached_dir.c    |   41 +-
 fs/smb/client/cifs_debug.c    |   53 +-
 fs/smb/client/cifs_debug.h    |    3 +-
 fs/smb/client/cifs_unicode.c  |   39 +
 fs/smb/client/cifs_unicode.h  |    2 +
 fs/smb/client/cifsencrypt.c   |    4 +-
 fs/smb/client/cifsfs.c        |   30 +-
 fs/smb/client/cifsglob.h      |  297 ++--
 fs/smb/client/cifsproto.h     |  168 +-
 fs/smb/client/cifssmb.c       |  345 ++--
 fs/smb/client/compress.c      |  155 +-
 fs/smb/client/compress.h      |   14 +-
 fs/smb/client/connect.c       |  707 ++++----
 fs/smb/client/ntlmssp.h       |    8 +-
 fs/smb/client/reparse.c       |    2 +-
 fs/smb/client/sess.c          |  306 ++--
 fs/smb/client/smb1debug.c     |   56 +-
 fs/smb/client/smb1encrypt.c   |  132 +-
 fs/smb/client/smb1maperror.c  |   15 +-
 fs/smb/client/smb1misc.c      |   22 +-
 fs/smb/client/smb1ops.c       |   96 +-
 fs/smb/client/smb1pdu.h       |   62 +-
 fs/smb/client/smb1proto.h     |   58 +-
 fs/smb/client/smb1session.c   |    4 +-
 fs/smb/client/smb1transport.c | 1154 +++++++++----
 fs/smb/client/smb2file.c      |    3 +-
 fs/smb/client/smb2inode.c     |    8 +-
 fs/smb/client/smb2maperror.c  |    3 +-
 fs/smb/client/smb2misc.c      |  423 +++--
 fs/smb/client/smb2ops.c       | 1190 ++------------
 fs/smb/client/smb2pdu.c       | 2889 +++++++++++++++++----------------
 fs/smb/client/smb2proto.h     |   83 +-
 fs/smb/client/smb2transport.c | 1172 +++++++++++--
 fs/smb/client/smbdirect.c     |  105 +-
 fs/smb/client/smbdirect.h     |    5 +-
 fs/smb/client/trace.h         |  180 ++
 fs/smb/client/transport.c     | 1254 ++++++++------
 fs/smb/common/smb2pdu.h       |   55 +-
 fs/smb/server/smb2pdu.c       |   22 +-
 include/linux/netfs.h         |   37 +
 include/linux/nls.h           |    1 +
 include/trace/events/netfs.h  |   28 +
 net/core/skbuff.c             |  119 ++
 47 files changed, 7135 insertions(+), 5053 deletions(-)
 create mode 100644 fs/netfs/rxqueue.c
 create mode 100644 fs/netfs/tcp_splice.c