drivers/s390/net/ism_drv.c | 2 +- include/net/smc.h | 7 +- net/smc/Kconfig | 13 ++ net/smc/Makefile | 2 +- net/smc/af_smc.c | 28 ++- net/smc/smc_cdc.c | 34 ++- net/smc/smc_core.c | 61 ++++- net/smc/smc_core.h | 1 + net/smc/smc_ism.c | 60 ++++- net/smc/smc_ism.h | 10 + net/smc/smc_loopback.c | 461 +++++++++++++++++++++++++++++++++++++ net/smc/smc_loopback.h | 52 +++++ 12 files changed, 715 insertions(+), 16 deletions(-) create mode 100644 net/smc/smc_loopback.c create mode 100644 net/smc/smc_loopback.h
This patch set acts as the second part of the new version of [1] (The first
part can be referred from [2]), the updated things of this version are listed
at the end.
- Background
SMC-D is now used in IBM z with ISM function to optimize network interconnect
for intra-CPC communications. Inspired by this, we try to make SMC-D available
on the non-s390 architecture through a software-implemented Emulated-ISM device,
that is the loopback-ism device here, to accelerate inter-process or
inter-containers communication within the same OS instance.
- Design
This patch set includes 3 parts:
- Patch #1: some prepare work for loopback-ism.
- Patch #2-#7: implement loopback-ism device. Noted that loopback-ism now
serves only SMC and no userspace interface exposed.
- Patch #8-#11: memory copy optimization for intra-OS scenario.
The loopback-ism device is designed as an ISMv2 device and not be limited to
a specific net namespace, ends of both inter-process connection (1/1' in diagram
below) or inter-container connection (2/2' in diagram below) can find the same
available loopback-ism and choose it during the CLC handshake.
Container 1 (ns1) Container 2 (ns2)
+-----------------------------------------+ +-------------------------+
| +-------+ +-------+ +-------+ | | +-------+ |
| | App A | | App B | | App C | | | | App D |<-+ |
| +-------+ +---^---+ +-------+ | | +-------+ |(2') |
| |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| |
| (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ |
| `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | |
+---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+
| | | |
Kernel | | | |
+----+-------v---+-----------v----------------------------------+---+----+
| | TCP | |
| | | |
| +--------------------------------------------------------------+ |
| |
| +--------------+ |
| | smc loopback | |
+---------------------------+--------------+-----------------------------+
loopback-ism device creates DMBs (shared memory) for each connection peer.
Since data transfer occurs within the same kernel, the sndbuf of each peer
is only a descriptor and point to the same memory region as peer DMB, so that
the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
Container 1 (ns1) Container 2 (ns2)
+-----------------------------------------+ +-------------------------+
| +-------+ | | +-------+ |
| | App C |-----+ | | | App D | |
| +-------+ | | | +-^-----+ |
| | | | | |
| (2) | | | (2') | |
| | | | | |
+---------------|-------------------------+ +----------|--------------+
| |
Kernel | |
+---------------|-----------------------------------------|--------------+
| +--------+ +--v-----+ +--------+ +--------+ |
| |dmb_desc| |snd_desc| |dmb_desc| |snd_desc| |
| +-----|--+ +--|-----+ +-----|--+ +--------+ |
| +-----|--+ | +-----|--+ |
| | DMB C | +---------------------------------| DMB D | |
| +--------+ +--------+ |
| |
| +--------------+ |
| | smc loopback | |
+---------------------------+--------------+-----------------------------+
- Benchmark Test
* Test environments:
- VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
- SMC sndbuf/DMB size 1MB.
* Test object:
- TCP: run on TCP loopback.
- SMC lo: run on SMC loopback-ism.
1. ipc-benchmark (see [3])
- ./<foo> -c 1000000 -s 100
TCP SMC-lo
Message
rate (msg/s) 81908 143128(+74.74%)
2. sockperf
- serv: <smc_run> taskset -c <cpu> sockperf sr --tcp
- clnt: <smc_run> taskset -c <cpu> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
TCP SMC-lo
Bandwidth(MBps) 5082.40 8134.22(+60.05%)
Latency(us) 5.956 3.308(-44.46%)
3. nginx/wrk
- serv: <smc_run> nginx
- clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
TCP SMC-lo
Requests/s 190113.20 248735.41(+30.83%)
4. redis-benchmark
- serv: <smc_run> redis-server
- clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
TCP SMC-lo
GET(Requests/s) 89505.48 117577.90(+31.36%)
SET(Requests/s) 89847.26 120336.95(+33.94%)
Change log:
RFC v5->RFC v4:
- Patch #2: minor changes in description of config SMC_LO and comments.
- Patch #10: minor changes in comments and if(smc_ism_support_dmb_nocopy())
check in smcd_cdc_msg_send().
- Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() and SMC_LO_CHID
to SMC_LO_RESERVED_CHID.
- Patch #5: memcpy while holding the ldev->dmb_ht_lock.
- Some expression changes in commit logs.
RFC v4->v3:
Link: https://lore.kernel.org/netdev/20240317100545.96663-1-guwen@linux.alibaba.com/
- The merge window of v6.9 is open, so post this series as an RFC.
- Patch #6: since some information fed back by smc_nl_handle_smcd_dev() dose
not apply to Emulated-ISM (including loopback-ism here), loopback-ism is
not exposed through smc netlink for the time being. we may refactor this
part when smc netlink interface is updated.
v3->v2:
Link: https://lore.kernel.org/netdev/20240312142743.41406-1-guwen@linux.alibaba.com/
- Patch #11: use tasklet_schedule(&conn->rx_tsklet) instead of smcd_cdc_rx_handler()
to avoid possible recursive locking of conn->send_lock and use {read|write}_lock_bh()
to acquire dmb_ht_lock.
v2->v1:
Link: https://lore.kernel.org/netdev/20240307095536.29648-1-guwen@linux.alibaba.com/
- All the patches: changed the term virtual-ISM to Emulated-ISM as defined by SMCv2.1.
- Patch #3: optimized the description of SMC_LO config. Avoid exposing loopback-ism
to sysfs and remove all the knobs until future definition clear.
- Patch #3: try to make lockdep happy by using read_lock_bh() in smc_lo_move_data().
- Patch #6: defaultly use physical contiguous DMB buffers.
- Patch #11: defaultly enable DMB no-copy for loopback-ism and free the DMB in
unregister_dmb or detach_dmb when dmb_node->refcnt reaches 0, instead of using
wait_event to keep waiting in unregister_dmb.
v1->RFC:
Link: https://lore.kernel.org/netdev/20240111120036.109903-1-guwen@linux.alibaba.com/
- Patch #9: merge rx_bytes and tx_bytes as xfer_bytes statistics:
/sys/devices/virtual/smc/loopback-ism/xfer_bytes
- Patch #10: add support_dmb_nocopy operation to check if SMC-D device supports
merging sndbuf with peer DMB.
- Patch #13 & #14: introduce loopback-ism device control of DMB memory type and
control of whether to merge sndbuf and DMB. They can be respectively set by:
/sys/devices/virtual/smc/loopback-ism/dmb_type
/sys/devices/virtual/smc/loopback-ism/dmb_copy
The motivation for these two control is that a performance bottleneck was
found when using vzalloced DMB and sndbuf is merged with DMB, and there are
many CPUs and CONFIG_HARDENED_USERCOPY is set [4]. The bottleneck is caused
by the lock contention in vmap_area_lock [5] which is involved in memcpy_from_msg()
or memcpy_to_msg(). Currently, Uladzislau Rezki is working on mitigating the
vmap lock contention [6]. It has significant effects, but using virtual memory
still has additional overhead compared to using physical memory.
So this new version provides controls of dmb_type and dmb_copy to suit
different scenarios.
- Some minor changes and comments improvements.
RFC->old version([1]):
Link: https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/
- Patch #1: improve the loopback-ism dump, it shows as follows now:
# smcd d
FID Type PCI-ID PCHID InUse #LGs PNET-ID
0000 0 loopback-ism ffff No 0
- Patch #3: introduce the smc_ism_set_v2_capable() helper and set
smc_ism_v2_capable when ISMv2 or virtual ISM is registered,
regardless of whether there is already a device in smcd device list.
- Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/.
- Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active
to activate or deactivate the loopback-ism.
- Patch #9: introduce the statistics of loopback-ism by
/sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}.
- Some minor changes and comments improvements.
[1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
[2] https://lore.kernel.org/netdev/20231219142616.80697-1-guwen@linux.alibaba.com/
[3] https://github.com/goldsborough/ipc-bench
[4] https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/
[5] https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/
[6] https://lore.kernel.org/all/20240102184633.748113-1-urezki@gmail.com/
Wen Gu (11):
net/smc: decouple ism_client from SMC-D DMB registration
net/smc: introduce loopback-ism for SMC intra-OS shortcut
net/smc: implement ID-related operations of loopback-ism
net/smc: implement some unsupported operations of loopback-ism
net/smc: implement DMB-related operations of loopback-ism
net/smc: ignore loopback-ism when dumping SMC-D devices
net/smc: register loopback-ism into SMC-D device list
net/smc: add operations to merge sndbuf with peer DMB
net/smc: {at|de}tach sndbuf to peer DMB if supported
net/smc: adapt cursor update when sndbuf and peer DMB are merged
net/smc: implement DMB-merged operations of loopback-ism
drivers/s390/net/ism_drv.c | 2 +-
include/net/smc.h | 7 +-
net/smc/Kconfig | 13 ++
net/smc/Makefile | 2 +-
net/smc/af_smc.c | 28 ++-
net/smc/smc_cdc.c | 34 ++-
net/smc/smc_core.c | 61 ++++-
net/smc/smc_core.h | 1 +
net/smc/smc_ism.c | 60 ++++-
net/smc/smc_ism.h | 10 +
net/smc/smc_loopback.c | 461 +++++++++++++++++++++++++++++++++++++
net/smc/smc_loopback.h | 52 +++++
12 files changed, 715 insertions(+), 16 deletions(-)
create mode 100644 net/smc/smc_loopback.c
create mode 100644 net/smc/smc_loopback.h
--
2.32.0.3.g01195cf9f
On 2024/3/24 21:55, Wen Gu wrote: > This patch set acts as the second part of the new version of [1] (The first > part can be referred from [2]), the updated things of this version are listed > at the end. > Change log: > > RFC v5->RFC v4: > - Patch #2: minor changes in description of config SMC_LO and comments. > - Patch #10: minor changes in comments and if(smc_ism_support_dmb_nocopy()) > check in smcd_cdc_msg_send(). > - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() and SMC_LO_CHID > to SMC_LO_RESERVED_CHID. > - Patch #5: memcpy while holding the ldev->dmb_ht_lock. > - Some expression changes in commit logs. > Hi, Jan. Do you have any comments on this version and should I post a new patch series without 'RFC'? Thank you.
On Wed, 2024-04-03 at 14:35 +0800, Wen Gu wrote: > > > On 2024/3/24 21:55, Wen Gu wrote: > > This patch set acts as the second part of the new version of [1] > > (The first > > part can be referred from [2]), the updated things of this version > > are listed > > at the end. > > > Change log: > > > > RFC v5->RFC v4: > > - Patch #2: minor changes in description of config SMC_LO and > > comments. > > - Patch #10: minor changes in comments and > > if(smc_ism_support_dmb_nocopy()) > > check in smcd_cdc_msg_send(). > > - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() > > and SMC_LO_CHID > > to SMC_LO_RESERVED_CHID. > > - Patch #5: memcpy while holding the ldev->dmb_ht_lock. > > - Some expression changes in commit logs. > > > > Hi, Jan. Do you have any comments on this version and should I post a > new patch series without 'RFC'? Thank you. Hi Wen, Jan has been out sick for a little while now, and Wenjia is expected back from a longer vacation tomorrow. So if you could hold off until begin of next week, Wenjia might have some more feedback. In the meantime, I'm looking at your patchset... Thank you, Gerd
On 2024/4/3 19:10, Gerd Bayer wrote: > On Wed, 2024-04-03 at 14:35 +0800, Wen Gu wrote: >> >> >> On 2024/3/24 21:55, Wen Gu wrote: >>> This patch set acts as the second part of the new version of [1] >>> (The first >>> part can be referred from [2]), the updated things of this version >>> are listed >>> at the end. >> >>> Change log: >>> >>> RFC v5->RFC v4: >>> - Patch #2: minor changes in description of config SMC_LO and >>> comments. >>> - Patch #10: minor changes in comments and >>> if(smc_ism_support_dmb_nocopy()) >>> check in smcd_cdc_msg_send(). >>> - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() >>> and SMC_LO_CHID >>> to SMC_LO_RESERVED_CHID. >>> - Patch #5: memcpy while holding the ldev->dmb_ht_lock. >>> - Some expression changes in commit logs. >>> >> >> Hi, Jan. Do you have any comments on this version and should I post a >> new patch series without 'RFC'? Thank you. > > Hi Wen, > > Jan has been out sick for a little while now, and Wenjia is expected > back from a longer vacation tomorrow. So if you could hold off until > begin of next week, Wenjia might have some more feedback. > > In the meantime, I'm looking at your patchset... > > Thank you, Gerd > Hi Gerd, is there any further information? I am wondering if I should wait for more feedback from SMC maintainers. Thanks! Hi Wenjia, when it's convenient for you, could you please confirm if [1] and [2] need to be included in the next version? Thanks! [1] https://lore.kernel.org/netdev/7291dd1b2d16fd9bbd90988ac5bcc3a46d17e3f4.camel@linux.ibm.com/ [2] https://lore.kernel.org/netdev/60b4aec0b4bf4474d651b653c86c280dafc4518a.camel@linux.ibm.com/
On 11.04.24 09:45, Wen Gu wrote: > > > On 2024/4/3 19:10, Gerd Bayer wrote: >> On Wed, 2024-04-03 at 14:35 +0800, Wen Gu wrote: >>> >>> >>> On 2024/3/24 21:55, Wen Gu wrote: >>>> This patch set acts as the second part of the new version of [1] >>>> (The first >>>> part can be referred from [2]), the updated things of this version >>>> are listed >>>> at the end. >>> >>>> Change log: >>>> >>>> RFC v5->RFC v4: >>>> - Patch #2: minor changes in description of config SMC_LO and >>>> comments. >>>> - Patch #10: minor changes in comments and >>>> if(smc_ism_support_dmb_nocopy()) >>>> check in smcd_cdc_msg_send(). >>>> - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() >>>> and SMC_LO_CHID >>>> to SMC_LO_RESERVED_CHID. >>>> - Patch #5: memcpy while holding the ldev->dmb_ht_lock. >>>> - Some expression changes in commit logs. >>>> >>> >>> Hi, Jan. Do you have any comments on this version and should I post a >>> new patch series without 'RFC'? Thank you. >> >> Hi Wen, >> >> Jan has been out sick for a little while now, and Wenjia is expected >> back from a longer vacation tomorrow. So if you could hold off until >> begin of next week, Wenjia might have some more feedback. >> >> In the meantime, I'm looking at your patchset... >> >> Thank you, Gerd >> > > Hi Gerd, is there any further information? I am wondering if I > should wait for more feedback from SMC maintainers. Thanks! > > > Hi Wenjia, when it's convenient for you, could you please confirm > if [1] and [2] need to be included in the next version? Thanks! > > [1] > https://lore.kernel.org/netdev/7291dd1b2d16fd9bbd90988ac5bcc3a46d17e3f4.camel@linux.ibm.com/ > [2] > https://lore.kernel.org/netdev/60b4aec0b4bf4474d651b653c86c280dafc4518a.camel@linux.ibm.com/ > Hi Wen, I'm just back, thank you for the patience! Firstly I want to thank Gerd and Niklas for review and bringing up these points! Here are some of my options on that: To [1]: I agree to document the ops as otional if it must not be supported. Since I don't really have any ideas, the classification souds reasonable to me. Going to the details, what about to take following options as mandatory: * query_remote_gid() * register_dmb()/unregister_dmb() * move_data() : I do see the necessary here. * get_local_gid() * get_chid() * get_dev() To [2]: I also agree to keep the ism-loopback at the very beginning of the List. That acting is also what I imaged previously. Thank you, gerd, again for testing it and find it out! Thanks, Wenjia
On 2024/4/11 17:32, Wenjia Zhang wrote: > > > On 11.04.24 09:45, Wen Gu wrote: >> >> >> On 2024/4/3 19:10, Gerd Bayer wrote: >>> On Wed, 2024-04-03 at 14:35 +0800, Wen Gu wrote: >>>> >>>> >>>> On 2024/3/24 21:55, Wen Gu wrote: >>>>> This patch set acts as the second part of the new version of [1] >>>>> (The first >>>>> part can be referred from [2]), the updated things of this version >>>>> are listed >>>>> at the end. >>>> >>>>> Change log: >>>>> >>>>> RFC v5->RFC v4: >>>>> - Patch #2: minor changes in description of config SMC_LO and >>>>> comments. >>>>> - Patch #10: minor changes in comments and >>>>> if(smc_ism_support_dmb_nocopy()) >>>>> check in smcd_cdc_msg_send(). >>>>> - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() >>>>> and SMC_LO_CHID >>>>> to SMC_LO_RESERVED_CHID. >>>>> - Patch #5: memcpy while holding the ldev->dmb_ht_lock. >>>>> - Some expression changes in commit logs. >>>>> >>>> >>>> Hi, Jan. Do you have any comments on this version and should I post a >>>> new patch series without 'RFC'? Thank you. >>> >>> Hi Wen, >>> >>> Jan has been out sick for a little while now, and Wenjia is expected >>> back from a longer vacation tomorrow. So if you could hold off until >>> begin of next week, Wenjia might have some more feedback. >>> >>> In the meantime, I'm looking at your patchset... >>> >>> Thank you, Gerd >>> >> >> Hi Gerd, is there any further information? I am wondering if I >> should wait for more feedback from SMC maintainers. Thanks! >> >> >> Hi Wenjia, when it's convenient for you, could you please confirm >> if [1] and [2] need to be included in the next version? Thanks! >> >> [1] https://lore.kernel.org/netdev/7291dd1b2d16fd9bbd90988ac5bcc3a46d17e3f4.camel@linux.ibm.com/ >> [2] https://lore.kernel.org/netdev/60b4aec0b4bf4474d651b653c86c280dafc4518a.camel@linux.ibm.com/ >> > > Hi Wen, > > I'm just back, thank you for the patience! > > Firstly I want to thank Gerd and Niklas for review and bringing up these points! > > Here are some of my options on that: > > To [1]: > I agree to document the ops as otional if it must not be supported. Since I don't really have any ideas, the > classification souds reasonable to me. Going to the details, what about to take following options as mandatory: > > * query_remote_gid() > * register_dmb()/unregister_dmb() > * move_data() : I do see the necessary here. > * get_local_gid() > * get_chid() > * get_dev() > > To [2]: > I also agree to keep the ism-loopback at the very beginning of the List. That acting is also what I imaged previously. > Thank you, gerd, again for testing it and find it out! > > Thanks, > Wenjia Hi Wenjia, welcome back! :) OK, then I will take these in my next version. Thank you all!
On 2024/4/3 19:10, Gerd Bayer wrote: > On Wed, 2024-04-03 at 14:35 +0800, Wen Gu wrote: >> >> >> On 2024/3/24 21:55, Wen Gu wrote: >>> This patch set acts as the second part of the new version of [1] >>> (The first >>> part can be referred from [2]), the updated things of this version >>> are listed >>> at the end. >> >>> Change log: >>> >>> RFC v5->RFC v4: >>> - Patch #2: minor changes in description of config SMC_LO and >>> comments. >>> - Patch #10: minor changes in comments and >>> if(smc_ism_support_dmb_nocopy()) >>> check in smcd_cdc_msg_send(). >>> - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() >>> and SMC_LO_CHID >>> to SMC_LO_RESERVED_CHID. >>> - Patch #5: memcpy while holding the ldev->dmb_ht_lock. >>> - Some expression changes in commit logs. >>> >> >> Hi, Jan. Do you have any comments on this version and should I post a >> new patch series without 'RFC'? Thank you. > > Hi Wen, > > Jan has been out sick for a little while now, and Wenjia is expected > back from a longer vacation tomorrow. So if you could hold off until > begin of next week, Wenjia might have some more feedback. > > In the meantime, I'm looking at your patchset... > > Thank you, Gerd > Hi Gerd, Thank you for the information and comments! I guess I will post a new version at the beginning of next week. Thanks!
© 2016 - 2026 Red Hat, Inc.