drivers/s390/net/ism_drv.c | 20 +- include/net/smc.h | 32 ++- include/uapi/linux/smc.h | 3 + include/uapi/linux/smc_diag.h | 2 + net/smc/Kconfig | 13 ++ net/smc/Makefile | 2 +- net/smc/af_smc.c | 88 ++++++-- net/smc/smc.h | 7 + net/smc/smc_cdc.c | 56 ++++- net/smc/smc_cdc.h | 1 + net/smc/smc_clc.c | 64 ++++-- net/smc/smc_clc.h | 10 +- net/smc/smc_core.c | 111 +++++++++- net/smc/smc_core.h | 9 +- net/smc/smc_diag.c | 11 +- net/smc/smc_ism.c | 100 ++++++--- net/smc/smc_ism.h | 24 ++- net/smc/smc_loopback.c | 489 ++++++++++++++++++++++++++++++++++++++++++ net/smc/smc_loopback.h | 54 +++++ net/smc/smc_pnet.c | 4 +- 20 files changed, 996 insertions(+), 104 deletions(-) create mode 100644 net/smc/smc_loopback.c create mode 100644 net/smc/smc_loopback.h
Hi, all # Background SMC-D is now used in IBM z with ISM function to optimize network interconnect for intra-CPC communications. Inspired by this, we try to make SMC-D available on the non-s390 architecture through a software-simulated virtual ISM device, such as loopback-ism device here, to accelerate inter-process or inter-containers communication within the same OS. # Design This patch set includes 4 parts: - Patch #1-#3: decouple ISM device hard code from SMC-D stack. - Patch #4-#8: implement virtual ISM extension defined in SMCv2.1. - Patch #9-#13: implement loopback-ism device. - Patch #14-#18: memory copy optimization for the case using loopback. The loopback-ism device is designed as a kernel device and not be limited to a specific net namespace, ends of both inter-process connection (1/1' in diagram below) or inter-container connection (2/2' in diagram below) will find that peer shares the same loopback-ism device during the CLC handshake. Then loopback-ism device will be chosen. Container 1 (ns1) Container 2 (ns2) +-----------------------------------------+ +-------------------------+ | +-------+ +-------+ +-------+ | | +-------+ | | | App A | | App B | | App C | | | | App D |<-+ | | +-------+ +---^---+ +-------+ | | +-------+ |(2') | | |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| | | (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ | | `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | | +---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+ | | | | Kernel | | | | +----+-------v---+-----------v----------------------------------+---+----+ | | TCP | | | | | | | +--------------------------------------------------------------+ | | | | +--------------+ | | | smc loopback | | +---------------------------+--------------+-----------------------------+ loopback-ism device allocs RMBs and sndbufs for each connection peer and 'moves' data from sndbuf at one end to RMB at the other end. Since communication occurs within the same kernel, the sndbuf can be mapped to peer RMB so that the data copy in loopback-ism case can be avoided. Container 1 (ns1) Container 2 (ns2) +-----------------------------------------+ +-------------------------+ | +-------+ +-------+ +-------+ | | +-------+ | | | App A | | App B | | App C | | | | App D | | | +-------+ +--^----+ +-------+ | | +---^---+ | | | | | | | | | | (1) | (1') | (2) | | | (2') | | | | | | | | | | +-------|-----------|---------------|-----+ +------------|------------+ | | | | Kernel | | | | +-------|-----------|---------------|-----------------------|------------+ | +-----v-+ +-------+ +---v---+ +-------+ | | | snd A |-+ | RMB B |<--+ | snd C |-+ +->| RMB D | | | +-------+ | +-------+ | +-------+ | | +-------+ | | +-------+ | +-------+ | +-------+ | | +-------+ | | | RMB A | | | snd B | | | RMB C | | | | snd D | | | +-------+ | +-------+ | +-------+ | | +-------+ | | | +-------------v+ | | | +-------------->| smc loopback |---------+ | +---------------------------+--------------+-----------------------------+ # Benchmark Test * Test environments: - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem. - SMC sndbuf/RMB size 1MB. * Test object: - TCP: run on TCP loopback. - domain: run on UNIX domain. - SMC lo: run on SMC loopback device. 1. ipc-benchmark (see [1]) - ./<foo> -c 1000000 -s 100 TCP SMC-lo Message rate (msg/s) 81539 151251(+85.50%) 2. sockperf - serv: <smc_run> taskset -c <cpu> sockperf sr --tcp - clnt: <smc_run> taskset -c <cpu> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30 TCP SMC-lo Bandwidth(MBps) 5313.66 8270.51(+55.65%) Latency(us) 5.806 3.207(-44.76%) 3. nginx/wrk - serv: <smc_run> nginx - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80 TCP SMC-lo Requests/s 194641.79 258656.13(+32.89%) 4. redis-benchmark - serv: <smc_run> redis-server - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024 TCP SMC-lo GET(Requests/s) 85855.34 115640.35(+34.69%) SET(Requests/s) 86337.15 118203.30(+36.90%) [1] https://github.com/goldsborough/ipc-bench v3->v4: - Fix build warning of patch#12 about: dmb_node->dma_addr = (dma_addr_t)dmb_node->cpu_addr; - Replace kzalloc with vzalloc to alloc DMB in loopback-ism, which may cause throughput or QPS to drop by 2% to 8%. see: https://lore.kernel.org/netdev/d6facfd5-e083-ffc7-05e5-2e8f3ef17735@linux.alibaba.com/ - Add SMC_LO in Kconfig to turn on/off loopback-ism. - Make extension GID cleaner. v2->v3: - Fix build warning of patch#1 and patch#10. v1->v2: - Fix build error on s390 arch. Wen Gu (18): net/smc: decouple ism_dev from SMC-D device dump net/smc: decouple ism_dev from SMC-D DMB registration net/smc: extract v2 check helper from SMC-D device registration net/smc: support SMCv2.x supplemental features negotiation net/smc: reserve CHID range for SMC-D virtual device net/smc: extend GID to 128bits only for virtual ISM device net/smc: disable SEID on non-s390 architecture net/smc: enable virtual ISM device feature bit net/smc: introduce SMC-D loopback device net/smc: implement ID-related operations of loopback net/smc: implement some unsupported operations of loopback net/smc: implement DMB-related operations of loopback net/smc: register loopback device as SMC-Dv2 device net/smc: add operation for getting DMB attribute net/smc: add operations for DMB attach and detach net/smc: avoid data copy from sndbuf to peer RMB in SMC-D net/smc: modify cursor update logic when sndbuf mapped to RMB net/smc: add interface implementation of loopback device drivers/s390/net/ism_drv.c | 20 +- include/net/smc.h | 32 ++- include/uapi/linux/smc.h | 3 + include/uapi/linux/smc_diag.h | 2 + net/smc/Kconfig | 13 ++ net/smc/Makefile | 2 +- net/smc/af_smc.c | 88 ++++++-- net/smc/smc.h | 7 + net/smc/smc_cdc.c | 56 ++++- net/smc/smc_cdc.h | 1 + net/smc/smc_clc.c | 64 ++++-- net/smc/smc_clc.h | 10 +- net/smc/smc_core.c | 111 +++++++++- net/smc/smc_core.h | 9 +- net/smc/smc_diag.c | 11 +- net/smc/smc_ism.c | 100 ++++++--- net/smc/smc_ism.h | 24 ++- net/smc/smc_loopback.c | 489 ++++++++++++++++++++++++++++++++++++++++++ net/smc/smc_loopback.h | 54 +++++ net/smc/smc_pnet.c | 4 +- 20 files changed, 996 insertions(+), 104 deletions(-) create mode 100644 net/smc/smc_loopback.c create mode 100644 net/smc/smc_loopback.h -- 1.8.3.1
On 24.09.23 17:16, Wen Gu wrote: > Wen Gu (18): > net/smc: decouple ism_dev from SMC-D device dump > net/smc: decouple ism_dev from SMC-D DMB registration > net/smc: extract v2 check helper from SMC-D device registration > net/smc: support SMCv2.x supplemental features negotiation > net/smc: reserve CHID range for SMC-D virtual device > net/smc: extend GID to 128bits only for virtual ISM device > net/smc: disable SEID on non-s390 architecture > net/smc: enable virtual ISM device feature bit > net/smc: introduce SMC-D loopback device > net/smc: implement ID-related operations of loopback > net/smc: implement some unsupported operations of loopback > net/smc: implement DMB-related operations of loopback > net/smc: register loopback device as SMC-Dv2 device > net/smc: add operation for getting DMB attribute > net/smc: add operations for DMB attach and detach > net/smc: avoid data copy from sndbuf to peer RMB in SMC-D > net/smc: modify cursor update logic when sndbuf mapped to RMB > net/smc: add interface implementation of loopback device > > drivers/s390/net/ism_drv.c | 20 +- > include/net/smc.h | 32 ++- > include/uapi/linux/smc.h | 3 + > include/uapi/linux/smc_diag.h | 2 + > net/smc/Kconfig | 13 ++ > net/smc/Makefile | 2 +- > net/smc/af_smc.c | 88 ++++++-- > net/smc/smc.h | 7 + > net/smc/smc_cdc.c | 56 ++++- > net/smc/smc_cdc.h | 1 + > net/smc/smc_clc.c | 64 ++++-- > net/smc/smc_clc.h | 10 +- > net/smc/smc_core.c | 111 +++++++++- > net/smc/smc_core.h | 9 +- > net/smc/smc_diag.c | 11 +- > net/smc/smc_ism.c | 100 ++++++--- > net/smc/smc_ism.h | 24 ++- > net/smc/smc_loopback.c | 489 ++++++++++++++++++++++++++++++++++++++++++ > net/smc/smc_loopback.h | 54 +++++ > net/smc/smc_pnet.c | 4 +- > 20 files changed, 996 insertions(+), 104 deletions(-) > create mode 100644 net/smc/smc_loopback.c > create mode 100644 net/smc/smc_loopback.h Hello Wen Gu, I applied and built your patches and noticed some things that you may want to consider in the next version: Series should be split up [2] Several lines exceed 80 columns [1][3] 'git clang-format HEAD~18' finds several formatting issues. Maybe not all of them need to be fixed. codespell *.patch 0006-net-smc-extend-GID-to-128bits-only-for-virtual-ISM-d.patch:7: protocal ==> protocol With your patches applied I get some new warnings [4]: Seems there are some ntoh conversions missing CHECK net/smc/af_smc.c net/smc/af_smc.c:723:32: warning: cast to restricted __be64 net/smc/af_smc.c:1427:52: warning: cast to restricted __be64 CHECK net/smc/smc_pnet.c CHECK net/smc/smc_ib.c CHECK net/smc/smc_clc.c net/smc/smc_clc.c:954:72: warning: incorrect type in argument 1 (different base types) net/smc/smc_clc.c:954:72: expected unsigned short [usertype] chid net/smc/smc_clc.c:954:72: got restricted __be16 [usertype] chid net/smc/smc_clc.c:1050:29: warning: incorrect type in assignment (different base types) net/smc/smc_clc.c:1050:29: expected unsigned long long [usertype] gid net/smc/smc_clc.c:1050:29: got restricted __be64 [usertype] net/smc/smc_clc.c:1051:31: warning: incorrect type in assignment (different base types) net/smc/smc_clc.c:1051:31: expected unsigned long long [usertype] token net/smc/smc_clc.c:1051:31: got restricted __be64 [usertype] [1] linux/Documentation/process/coding-style.rst [2] https://www.kernel.org/doc/html/v6.3/process/maintainer-netdev.html?highlight=network [3] scripts/checkpatch.pl --strict --max-line-length=80 --git HEAD-18 [4] make C=2 CF=-D__CHECK_ENDIAN__ M=net/smc -Wunused-function -Wimplicit-fallthrough -Wincompatible-function-pointer-types-strict When I installed the patches, I noticed that > smcd info showed an SEID, even though I had no ISM device --> good > smcd device FID Type PCI-ID PCHID InUse #LGs PNET-ID 0000 0 0000 Yes 2 This needs some improvements.., but I'm not sure what is the best way to display virtual smcd interfaces in the smc-tools. I was able to do SMC transfers via the smcd-loopback feature :-D
On 2023/9/27 23:16, Alexandra Winter wrote: > > > On 24.09.23 17:16, Wen Gu wrote: >> Wen Gu (18): >> net/smc: decouple ism_dev from SMC-D device dump >> net/smc: decouple ism_dev from SMC-D DMB registration >> net/smc: extract v2 check helper from SMC-D device registration >> net/smc: support SMCv2.x supplemental features negotiation >> net/smc: reserve CHID range for SMC-D virtual device >> net/smc: extend GID to 128bits only for virtual ISM device >> net/smc: disable SEID on non-s390 architecture >> net/smc: enable virtual ISM device feature bit >> net/smc: introduce SMC-D loopback device >> net/smc: implement ID-related operations of loopback >> net/smc: implement some unsupported operations of loopback >> net/smc: implement DMB-related operations of loopback >> net/smc: register loopback device as SMC-Dv2 device >> net/smc: add operation for getting DMB attribute >> net/smc: add operations for DMB attach and detach >> net/smc: avoid data copy from sndbuf to peer RMB in SMC-D >> net/smc: modify cursor update logic when sndbuf mapped to RMB >> net/smc: add interface implementation of loopback device >> >> drivers/s390/net/ism_drv.c | 20 +- >> include/net/smc.h | 32 ++- >> include/uapi/linux/smc.h | 3 + >> include/uapi/linux/smc_diag.h | 2 + >> net/smc/Kconfig | 13 ++ >> net/smc/Makefile | 2 +- >> net/smc/af_smc.c | 88 ++++++-- >> net/smc/smc.h | 7 + >> net/smc/smc_cdc.c | 56 ++++- >> net/smc/smc_cdc.h | 1 + >> net/smc/smc_clc.c | 64 ++++-- >> net/smc/smc_clc.h | 10 +- >> net/smc/smc_core.c | 111 +++++++++- >> net/smc/smc_core.h | 9 +- >> net/smc/smc_diag.c | 11 +- >> net/smc/smc_ism.c | 100 ++++++--- >> net/smc/smc_ism.h | 24 ++- >> net/smc/smc_loopback.c | 489 ++++++++++++++++++++++++++++++++++++++++++ >> net/smc/smc_loopback.h | 54 +++++ >> net/smc/smc_pnet.c | 4 +- >> 20 files changed, 996 insertions(+), 104 deletions(-) >> create mode 100644 net/smc/smc_loopback.c >> create mode 100644 net/smc/smc_loopback.h > > > Hello Wen Gu, > > I applied and built your patches and noticed some things that you may want to consider in the next version: > > Series should be split up [2] > > Several lines exceed 80 columns [1][3] > > 'git clang-format HEAD~18' finds several formatting issues. > Maybe not all of them need to be fixed. > > codespell *.patch > 0006-net-smc-extend-GID-to-128bits-only-for-virtual-ISM-d.patch:7: protocal ==> protocol > > With your patches applied I get some new warnings [4]: > Seems there are some ntoh conversions missing > > CHECK net/smc/af_smc.c > net/smc/af_smc.c:723:32: warning: cast to restricted __be64 > net/smc/af_smc.c:1427:52: warning: cast to restricted __be64 > CHECK net/smc/smc_pnet.c > CHECK net/smc/smc_ib.c > CHECK net/smc/smc_clc.c > net/smc/smc_clc.c:954:72: warning: incorrect type in argument 1 (different base types) > net/smc/smc_clc.c:954:72: expected unsigned short [usertype] chid > net/smc/smc_clc.c:954:72: got restricted __be16 [usertype] chid > net/smc/smc_clc.c:1050:29: warning: incorrect type in assignment (different base types) > net/smc/smc_clc.c:1050:29: expected unsigned long long [usertype] gid > net/smc/smc_clc.c:1050:29: got restricted __be64 [usertype] > net/smc/smc_clc.c:1051:31: warning: incorrect type in assignment (different base types) > net/smc/smc_clc.c:1051:31: expected unsigned long long [usertype] token > net/smc/smc_clc.c:1051:31: got restricted __be64 [usertype] > > > [1] linux/Documentation/process/coding-style.rst > [2] https://www.kernel.org/doc/html/v6.3/process/maintainer-netdev.html?highlight=network > [3] scripts/checkpatch.pl --strict --max-line-length=80 --git HEAD-18 > [4] make C=2 CF=-D__CHECK_ENDIAN__ M=net/smc -Wunused-function -Wimplicit-fallthrough -Wincompatible-function-pointer-types-strict > > Hi Sandy, Thank you very much for your detailed comments. They are really helpful. I will check and fix them in my next version. > > When I installed the patches, I noticed that >> smcd info > showed an SEID, even though I had no ISM device --> good > Yes, virtual ISM device will return the right SEID (same as that returned by ISM device) on s390 arch. > >> smcd device > FID Type PCI-ID PCHID InUse #LGs PNET-ID > 0000 0 0000 Yes 2 > > This needs some improvements.., but I'm not sure what is the best way to display virtual smcd interfaces in the smc-tools. > Right. My idea is to add the display of the device's name and GID along with the PCI information (all zero) to indicate it's virtual smcd device. > > I was able to do SMC transfers via the smcd-loopback feature :-D > Glad to hear this. I will improve the issues you mentioned in the next version and next version will be split up. Thank you very much. Wen Gu
On 27.09.23 17:16, Alexandra Winter wrote: > Hello Wen Gu, > > I applied and built your patches and noticed some things that you may want to consider in the next version: FYI, patchwork basically complains about many the same issues: https://patchwork.kernel.org/project/netdevbpf/list/?series=787037&state=* In general you should run those check BEFORE you send the patches and not rely on patchwork.
On 2023/9/28 16:56, Alexandra Winter wrote: > > > On 27.09.23 17:16, Alexandra Winter wrote: >> Hello Wen Gu, >> >> I applied and built your patches and noticed some things that you may want to consider in the next version: > > > FYI, patchwork basically complains about many the same issues: > https://patchwork.kernel.org/project/netdevbpf/list/?series=787037&state=* > > In general you should run those check BEFORE you send the patches and not rely on patchwork. Thank you Sandy. I seem to have not seen the specific content of these checks. May I ask how to run those patchwork check locally? So that I can make sure everything is ok before send them.
On 28.09.23 19:29, Wen Gu wrote: > > > On 2023/9/28 16:56, Alexandra Winter wrote: >> >> >> On 27.09.23 17:16, Alexandra Winter wrote: >>> Hello Wen Gu, >>> >>> I applied and built your patches and noticed some things that you may want to consider in the next version: >> >> >> FYI, patchwork basically complains about many the same issues: >> https://patchwork.kernel.org/project/netdevbpf/list/?series=787037&state=* >> >> In general you should run those check BEFORE you send the patches and not rely on patchwork. > Thank you Sandy. I seem to have not seen the specific content of these checks. May I ask how to > run those patchwork check locally? So that I can make sure everything is ok before send them. > Citing from Documentation/process/maintainer-netdev.rst : "patchwork checks ~~~~~~~~~~~~~~~~ Checks in patchwork are mostly simple wrappers around existing kernel scripts, the sources are available at: https://github.com/kuba-moo/nipa/tree/master/tests **Do not** post your patches just to run them through the checks. You must ensure that your patches are ready by testing them locally before posting to the mailing list. The patchwork build bot instance gets overloaded very easily and netdev@vger really doesn't need more traffic if we can help it." HTH
On 2023/9/29 21:31, Alexandra Winter wrote: > > > On 28.09.23 19:29, Wen Gu wrote: >> >> >> On 2023/9/28 16:56, Alexandra Winter wrote: >>> >>> >>> On 27.09.23 17:16, Alexandra Winter wrote: >>>> Hello Wen Gu, >>>> >>>> I applied and built your patches and noticed some things that you may want to consider in the next version: >>> >>> >>> FYI, patchwork basically complains about many the same issues: >>> https://patchwork.kernel.org/project/netdevbpf/list/?series=787037&state=* >>> >>> In general you should run those check BEFORE you send the patches and not rely on patchwork. >> Thank you Sandy. I seem to have not seen the specific content of these checks. May I ask how to >> run those patchwork check locally? So that I can make sure everything is ok before send them. >> > > Citing from Documentation/process/maintainer-netdev.rst : > > "patchwork checks > ~~~~~~~~~~~~~~~~ > > Checks in patchwork are mostly simple wrappers around existing kernel > scripts, the sources are available at: > > https://github.com/kuba-moo/nipa/tree/master/tests > > **Do not** post your patches just to run them through the checks. > You must ensure that your patches are ready by testing them locally > before posting to the mailing list. The patchwork build bot instance > gets overloaded very easily and netdev@vger really doesn't need more > traffic if we can help it." > > HTH Thank you! Sandy.
On 24.09.23 17:16, Wen Gu wrote: > This patch set includes 4 parts: > > - Patch #1-#3: decouple ISM device hard code from SMC-D stack. > - Patch #4-#8: implement virtual ISM extension defined in SMCv2.1. > - Patch #9-#13: implement loopback-ism device. > - Patch #14-#18: memory copy optimization for the case using loopback. Your cover letter is very well helpful, thanks a lot for that. I really like the way you grouped the patches. Just a thought: If it takes too long to get this large patchset reviewed, you could split it up into smaller sets and get them upstream one after the other. I think it is especially valuable to more crisply define the interface between SMC-D and the smc-d devices, given that virtio-ism may soon be a third kind of smcd device.
On Sun, 2023-09-24 at 23:16 +0800, Wen Gu wrote: > Hi, all > > # Background > > SMC-D is now used in IBM z with ISM function to optimize network interconnect > for intra-CPC communications. Inspired by this, we try to make SMC-D available > on the non-s390 architecture through a software-simulated virtual ISM device, > such as loopback-ism device here, to accelerate inter-process or inter-containers > communication within the same OS. > > # Design > > This patch set includes 4 parts: > > - Patch #1-#3: decouple ISM device hard code from SMC-D stack. > - Patch #4-#8: implement virtual ISM extension defined in SMCv2.1. > - Patch #9-#13: implement loopback-ism device. > - Patch #14-#18: memory copy optimization for the case using loopback. > > The loopback-ism device is designed as a kernel device and not be limited to > a specific net namespace, ends of both inter-process connection (1/1' in diagram > below) or inter-container connection (2/2' in diagram below) will find that peer > shares the same loopback-ism device during the CLC handshake. Then loopback-ism > device will be chosen. > > Container 1 (ns1) Container 2 (ns2) > +-----------------------------------------+ +-------------------------+ > | +-------+ +-------+ +-------+ | | +-------+ | > | | App A | | App B | | App C | | | | App D |<-+ | > | +-------+ +---^---+ +-------+ | | +-------+ |(2') | > | |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| | > | (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ | > | `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | | > +---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+ > | | | | > Kernel | | | | > +----+-------v---+-----------v----------------------------------+---+----+ > | | TCP | | > | | | | > | +--------------------------------------------------------------+ | > | | > | +--------------+ | > | | smc loopback | | > +---------------------------+--------------+-----------------------------+ > > > loopback-ism device allocs RMBs and sndbufs for each connection peer and 'moves' > data from sndbuf at one end to RMB at the other end. Since communication occurs > within the same kernel, the sndbuf can be mapped to peer RMB so that the data > copy in loopback-ism case can be avoided. > > Container 1 (ns1) Container 2 (ns2) > +-----------------------------------------+ +-------------------------+ > | +-------+ +-------+ +-------+ | | +-------+ | > | | App A | | App B | | App C | | | | App D | | > | +-------+ +--^----+ +-------+ | | +---^---+ | > | | | | | | | | > | (1) | (1') | (2) | | | (2') | | > | | | | | | | | > +-------|-----------|---------------|-----+ +------------|------------+ > | | | | > Kernel | | | | > +-------|-----------|---------------|-----------------------|------------+ > | +-----v-+ +-------+ +---v---+ +-------+ | > | | snd A |-+ | RMB B |<--+ | snd C |-+ +->| RMB D | | > | +-------+ | +-------+ | +-------+ | | +-------+ | > | +-------+ | +-------+ | +-------+ | | +-------+ | > | | RMB A | | | snd B | | | RMB C | | | | snd D | | > | +-------+ | +-------+ | +-------+ | | +-------+ | > | | +-------------v+ | | > | +-------------->| smc loopback |---------+ | > +---------------------------+--------------+-----------------------------+ > > # Benchmark Test > > * Test environments: > - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem. > - SMC sndbuf/RMB size 1MB. > > * Test object: > - TCP: run on TCP loopback. > - domain: run on UNIX domain. > - SMC lo: run on SMC loopback device. > > 1. ipc-benchmark (see [1]) > > - ./<foo> -c 1000000 -s 100 > > TCP SMC-lo > Message > rate (msg/s) 81539 151251(+85.50%) > > 2. sockperf > > - serv: <smc_run> taskset -c <cpu> sockperf sr --tcp > - clnt: <smc_run> taskset -c <cpu> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30 > > TCP SMC-lo > Bandwidth(MBps) 5313.66 8270.51(+55.65%) > Latency(us) 5.806 3.207(-44.76%) > > 3. nginx/wrk > > - serv: <smc_run> nginx > - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80 > > TCP SMC-lo > Requests/s 194641.79 258656.13(+32.89%) > > 4. redis-benchmark > > - serv: <smc_run> redis-server > - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024 > > TCP SMC-lo > GET(Requests/s) 85855.34 115640.35(+34.69%) > SET(Requests/s) 86337.15 118203.30(+36.90%) > > [1] https://github.com/goldsborough/ipc-bench > Hi Wen Gu, I've been trying out your series with iperf3, qperf, and uperf on s390x. I'm using network namespaces with a ConnectX VF from the same card in each namespace for the initial TCP/IP connection i.e. initially it goes out to a real NIC even if that can switch internally. All of these look great for streaming workloads both in terms of performance and stability. With a Connect-Request-Response workload and uperf however I've run into issues. The test configuration I use is as follows: Client Command: # host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml Server Command: # ip netns exec server smc_run uperf -s &> /dev/null Uperf tcp_crr.xml: <?xml version="1.0"?> <profile name="TCP_CRR"> <group nthreads="12"> <transaction duration="120"> <flowop type="connect" options="remotehost=$host protocol=tcp" /> <flowop type="write" options="size=200"/> <flowop type="read" options="size=1000"/> <flowop type="disconnect" /> </transaction> </group> </profile> The workload first runs fine but then after about 4 GB of data transferred fails with "Connection refused" and "Connection reset by peer" errors. The failure is not permanent however and re-running the streaming workloads run fine again (with both uperf server and client restarted). So I suspect something gets stuck in either the client or server sockets. The same workload runs fine with TCP/IP of course. Thanks, Niklas
On 2023/10/5 16:21, Niklas Schnelle wrote: > > Hi Wen Gu, > > I've been trying out your series with iperf3, qperf, and uperf on > s390x. I'm using network namespaces with a ConnectX VF from the same > card in each namespace for the initial TCP/IP connection i.e. initially > it goes out to a real NIC even if that can switch internally. All of > these look great for streaming workloads both in terms of performance > and stability. With a Connect-Request-Response workload and uperf > however I've run into issues. The test configuration I use is as > follows: > > Client Command: > > # host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml > > Server Command: > > # ip netns exec server smc_run uperf -s &> /dev/null > > Uperf tcp_crr.xml: > > <?xml version="1.0"?> > <profile name="TCP_CRR"> > <group nthreads="12"> > <transaction duration="120"> > <flowop type="connect" options="remotehost=$host protocol=tcp" /> > <flowop type="write" options="size=200"/> > <flowop type="read" options="size=1000"/> > <flowop type="disconnect" /> > </transaction> > </group> > </profile> > > The workload first runs fine but then after about 4 GB of data > transferred fails with "Connection refused" and "Connection reset by > peer" errors. The failure is not permanent however and re-running > the streaming workloads run fine again (with both uperf server and > client restarted). So I suspect something gets stuck in either the > client or server sockets. The same workload runs fine with TCP/IP of > course. > > Thanks, > Niklas > > Hi Niklas, Thank you very much for the test. With the test example you provided, I've reproduced the issue in my VM. And moreover, sometimes the test complains with 'Error saying goodbye with <ip>' I'll figure out what's going on here. Thanks! Wen Gu
On 2023/10/8 15:19, Wen Gu wrote: > > > On 2023/10/5 16:21, Niklas Schnelle wrote: > >> >> Hi Wen Gu, >> >> I've been trying out your series with iperf3, qperf, and uperf on >> s390x. I'm using network namespaces with a ConnectX VF from the same >> card in each namespace for the initial TCP/IP connection i.e. initially >> it goes out to a real NIC even if that can switch internally. All of >> these look great for streaming workloads both in terms of performance >> and stability. With a Connect-Request-Response workload and uperf >> however I've run into issues. The test configuration I use is as >> follows: >> >> Client Command: >> >> # host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml >> >> Server Command: >> >> # ip netns exec server smc_run uperf -s &> /dev/null >> >> Uperf tcp_crr.xml: >> >> <?xml version="1.0"?> >> <profile name="TCP_CRR"> >> <group nthreads="12"> >> <transaction duration="120"> >> <flowop type="connect" options="remotehost=$host protocol=tcp" /> >> <flowop type="write" options="size=200"/> >> <flowop type="read" options="size=1000"/> >> <flowop type="disconnect" /> >> </transaction> >> </group> >> </profile> >> >> The workload first runs fine but then after about 4 GB of data >> transferred fails with "Connection refused" and "Connection reset by >> peer" errors. The failure is not permanent however and re-running >> the streaming workloads run fine again (with both uperf server and >> client restarted). So I suspect something gets stuck in either the >> client or server sockets. The same workload runs fine with TCP/IP of >> course. >> >> Thanks, >> Niklas >> >> > > Hi Niklas, > > Thank you very much for the test. With the test example you provided, I've > reproduced the issue in my VM. And moreover, sometimes the test complains > with 'Error saying goodbye with <ip>' > > I'll figure out what's going on here. > > Thanks! > Wen Gu I think that there is a common issue for SMC-R and SMC-D. I also reproduce 'connection reset by peer' and 'Error saying goodbye with <ip>' when using SMC-R under the same test condition. They occur at the end of the test. When the uperf test time ends, some signals are sent. At this point there are usually some SMC connections doing CLC handshake. I catch some -EINTR(-4) in client and -ECONNRESET(-104) in server returned from smc_clc_wait_msg, (correspondingly handshake error counts also increase) and TCP RST packets sent to terminate the CLC TCP connection(clcsock). I am not sure if this should be considered as a bydesign or a bug of SMC. From an application perspective, the conn reset behavior only happens when using SMC. @Wenjia, could you please take a look at this? Thanks, Wen Gu
On 17.10.23 05:49, Wen Gu wrote: > > > On 2023/10/8 15:19, Wen Gu wrote: >> >> >> On 2023/10/5 16:21, Niklas Schnelle wrote: >> >>> >>> Hi Wen Gu, >>> >>> I've been trying out your series with iperf3, qperf, and uperf on >>> s390x. I'm using network namespaces with a ConnectX VF from the same >>> card in each namespace for the initial TCP/IP connection i.e. initially >>> it goes out to a real NIC even if that can switch internally. All of >>> these look great for streaming workloads both in terms of performance >>> and stability. With a Connect-Request-Response workload and uperf >>> however I've run into issues. The test configuration I use is as >>> follows: >>> >>> Client Command: >>> >>> # host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml >>> >>> Server Command: >>> >>> # ip netns exec server smc_run uperf -s &> /dev/null >>> >>> Uperf tcp_crr.xml: >>> >>> <?xml version="1.0"?> >>> <profile name="TCP_CRR"> >>> <group nthreads="12"> >>> <transaction duration="120"> >>> <flowop type="connect" >>> options="remotehost=$host protocol=tcp" /> >>> <flowop type="write" options="size=200"/> >>> <flowop type="read" options="size=1000"/> >>> <flowop type="disconnect" /> >>> </transaction> >>> </group> >>> </profile> >>> >>> The workload first runs fine but then after about 4 GB of data >>> transferred fails with "Connection refused" and "Connection reset by >>> peer" errors. The failure is not permanent however and re-running >>> the streaming workloads run fine again (with both uperf server and >>> client restarted). So I suspect something gets stuck in either the >>> client or server sockets. The same workload runs fine with TCP/IP of >>> course. >>> >>> Thanks, >>> Niklas >>> >>> >> >> Hi Niklas, >> >> Thank you very much for the test. With the test example you provided, >> I've >> reproduced the issue in my VM. And moreover, sometimes the test complains >> with 'Error saying goodbye with <ip>' >> >> I'll figure out what's going on here. >> >> Thanks! >> Wen Gu > > I think that there is a common issue for SMC-R and SMC-D. I also reproduce > 'connection reset by peer' and 'Error saying goodbye with <ip>' when using > SMC-R under the same test condition. They occur at the end of the test. > > When the uperf test time ends, some signals are sent. At this point there > are usually some SMC connections doing CLC handshake. I catch some > -EINTR(-4) > in client and -ECONNRESET(-104) in server returned from smc_clc_wait_msg, > (correspondingly handshake error counts also increase) and TCP RST packets > sent to terminate the CLC TCP connection(clcsock). > > I am not sure if this should be considered as a bydesign or a bug of SMC. > From an application perspective, the conn reset behavior only happens when > using SMC. > > @Wenjia, could you please take a look at this? > > Thanks, > Wen Gu Hi Wen, Do you mean the bug in smc_clc_wait_msg()? If yes, I can not see any problem in the smc_clc_wait_msg(). From your description, it looks to me like the server should get the CLC_PROPOSAL message, but nothing in it while the client is waiting for the accept CLC_ACCEPT message from the server until the wait loops is broken out. Thanks, Wenjia
© 2016 - 2025 Red Hat, Inc.