[PATCH 0/2] NVMe_over_TCP: support specifying the congestion-control

Mingbao Sun posted 2 patches 4 years, 3 months ago
drivers/nvme/host/fabrics.c    | 24 ++++++++++++++++
drivers/nvme/host/fabrics.h    |  2 ++
drivers/nvme/host/tcp.c        | 20 ++++++++++++-
drivers/nvme/target/configfs.c | 52 ++++++++++++++++++++++++++++++++++
drivers/nvme/target/nvmet.h    |  1 +
drivers/nvme/target/tcp.c      | 27 ++++++++++++++++++
6 files changed, 125 insertions(+), 1 deletion(-)
[PATCH 0/2] NVMe_over_TCP: support specifying the congestion-control
Posted by Mingbao Sun 4 years, 3 months ago
From: Mingbao Sun <tyler.sun@dell.com>

Hi all,

congestion-control could have a noticeable impaction on the
performance of TCP-based communications. This is of course true
to NVMe_over_TCP.

Different congestion-controls (e.g., cubic, dctcp) are suitable for
different scenarios. Proper adoption of congestion control would benefit
the performance. On the contrary, the performance could be destroyed.

Though we can specify the congestion-control of NVMe_over_TCP via
writing '/proc/sys/net/ipv4/tcp_congestion_control', but this also
changes the congestion-control of all the future TCP sockets that
have not been explicitly assigned the congestion-control, thus bringing
potential impaction on their performance.

So it makes sense to make NVMe_over_TCP support specifying the
congestion-control.

The first commit addresses the target side, and the second one
addresses the host side.

Mingbao Sun (2):
  nvmet-tcp: support specifying the congestion-control
  nvme-tcp: support specifying the congestion-control

 drivers/nvme/host/fabrics.c    | 24 ++++++++++++++++
 drivers/nvme/host/fabrics.h    |  2 ++
 drivers/nvme/host/tcp.c        | 20 ++++++++++++-
 drivers/nvme/target/configfs.c | 52 ++++++++++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h    |  1 +
 drivers/nvme/target/tcp.c      | 27 ++++++++++++++++++
 6 files changed, 125 insertions(+), 1 deletion(-)

-- 
2.26.2
Re: [PATCH 0/2] NVMe_over_TCP: support specifying the congestion-control
Posted by Mingbao Sun 4 years, 3 months ago
I feel that I'd better address this a little bit more to express the
meaning behind this feature.

You know, InfiniBand/RoCE provides NVMe-oF a lossless network
environment (that is zero packet loss), which is a great advantage
to performance.

In contrast, 'TCP/IP + ethernet' is often used as a lossy network
environment (packet dropping often occurs). 
And once packet dropping occurs, timeout-retransmission would be
triggered. But once timeout-retransmission was triggered, it’s a great
damage to the performance.

So although NVMe/TCP may have a bandwidth competitive to that of
NVMe/RDMA, but the packet dropping of the former is a flaw to
its performance.

However, with the combination of the following conditions, NVMe/TCP
can almost be as competitive as NVMe/RDMA in the data center.

  - Ethernet NICs supporting QoS configuration (support mapping TOS/DSCP
    in IP header into priority, support PFC)

  - Ethernet Switches supporting ECN marking, supporting adjusting
    buffer size of each priority.

  - NVMe/TCP supports specifying the tos for its TCP traffic
    (already implemented)

  - NVMe/TCP supports specifying dctcp as the congestion-control of its
    TCP sockets (the work of this feature)

So this feature is the last item from the software aspect to form up the
above combination.