From nobody Sat Nov 23 14:28:44 2024 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73D08158D96; Tue, 19 Nov 2024 08:38:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.255 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732005512; cv=none; b=ulejSL8LBm9cvTprUjZILGnBDCmUzdOqnDOnwx3EtKKmKJz3n0lq+V7Zryl8R5E0sjhbRunU8ZJ5I6nILoFgDZaCKbYBO3SJpbamNtjXTPaabeXtsuOj6DdrB4ePNdHHFL90/jEjh5a/a2KaLGJ9ehpSWoHHRK7NqbRlPwwOHFM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732005512; c=relaxed/simple; bh=8MOZvwOA2k/at8HFZualT9WLA3LcWXeKk9wIaUaDI/E=; h=From:To:CC:Subject:Date:Message-ID:Content-Type:MIME-Version; b=oeOxC6usWSjjAtzRSPX7iAgYAvNBpB1Go6ao1WLPgfy+lSdNbe8HqpDXQBHxtapVu2HGZ2w0lpEZNKsL3Wiz9bOkX2S7r99dpAgpkOw8n/BKKR5aL5b8iGoWiByM+j562oba6GYfMIs0LlvdluwrXlnBvFI0RsSLERT0voLFH/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4XsyWB1sppz1V4kG; Tue, 19 Nov 2024 16:35:50 +0800 (CST) Received: from dggemv703-chm.china.huawei.com (unknown [10.3.19.46]) by mail.maildlp.com (Postfix) with ESMTPS id 1D104180105; Tue, 19 Nov 2024 16:38:27 +0800 (CST) Received: from kwepemn500013.china.huawei.com (7.202.194.154) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 19 Nov 2024 16:38:26 +0800 Received: from dggpeml100007.china.huawei.com (7.185.36.28) by kwepemn500013.china.huawei.com (7.202.194.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 19 Nov 2024 16:38:26 +0800 Received: from dggpeml100007.china.huawei.com ([7.185.36.28]) by dggpeml100007.china.huawei.com ([7.185.36.28]) with mapi id 15.01.2507.039; Tue, 19 Nov 2024 16:38:26 +0800 From: mengkanglai To: Kuniyuki Iwashima CC: "davem@davemloft.net" , "dsahern@kernel.org" , "edumazet@google.com" , "Fengtao (fengtao, Euler)" , "kuba@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "pabeni@redhat.com" , "Yanan (Euler)" Subject: =?gb2312?B?UkU6tPC4tDoga2VybmVsIHRjcCBzb2NrZXRzIHN0dWNrIGluIEZJTl9XQUlU?= =?gb2312?Q?1_after_call_tcp=5Fclose?= Thread-Topic: =?gb2312?B?tPC4tDoga2VybmVsIHRjcCBzb2NrZXRzIHN0dWNrIGluIEZJTl9XQUlUMSBh?= =?gb2312?Q?fter_call_tcp=5Fclose?= Thread-Index: Ads6WLnMxTxT5H3UTJOsoe68wa8MAw== Date: Tue, 19 Nov 2024 08:38:26 +0000 Message-ID: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- > =E5=8F=91=E4=BB=B6=E4=BA=BA: Kuniyuki Iwashima =20 > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2024=E5=B9=B411=E6=9C=8814=E6=97=A5= 2:56 > =E6=94=B6=E4=BB=B6=E4=BA=BA: mengkanglai > =E6=8A=84=E9=80=81: davem@davemloft.net; dsahern@kernel.org; edumazet@goo= gle.com; Fengtao (fengtao, Euler) ; kuba@kernel.org; = linux-kernel@vger.kernel.org; netdev@vger.kernel.org; pabeni@redhat.com; Ya= nan (Euler) ; kuniyu@amazon.com > =E4=B8=BB=E9=A2=98: Re: kernel tcp sockets stuck in FIN_WAIT1 after call = tcp_close >=20 > From: mengkanglai > Date: Wed, 13 Nov 2024 12:40:34 +0000 > > Hello, Eric: > > Commit 151c9c724d05 (tcp: properly terminate timers for kernel=20 > > sockets) introduce inet_csk_clear_xmit_timers_sync in tcp_close. > > For kernel sockets it does not hold sk->sk_net_refcnt, if this is=20 > > kernel tcp socket it will call tcp_send_fin in __tcp_close to send FIN=20 > > packet to remotes server, >=20 > Just curious which subsystem the kernel socket is created by. >=20 > Recently, CIFS and sunrpc are (being) converted to hold net refcnt. >=20 > CIFS: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/= commit/?id=3Def7134c7fc48e1441b398e55a862232868a6f0a7 > sunrpc: https://lore.kernel.org/netdev/20241112135434.803890-1-liujian56@= huawei.com/ >=20 > I remember RDS's listener does not hold refcnt but other client sockets (= SMC, RDS, MPTCP, CIFS, sunrpc) do. >=20 > I think all TCP kernel sockets should hold netns refcnt except for one cr= eated at pernet_operations.init() hook like RDS. >=20 > > if this fin packet lost due to network faults, tcp should retransmit=20 > > this fin packet, but tcp_timer stopped by inet_csk_clear_xmit_timers_sy= nc. > > tcp sockets state will stuck in FIN_WAIT1 and never go away. I think=20 > > it's not right. I found this problem when testing nfs. sunrpc: https://lore.kernel.org/netd= ev/20241112135434.803890-1-liujian56@huawei.com/ will solve this problem.=20 I agree with that all TCP kernel sockets should hold netns refcnt. However, for kernel tcp sockets created by other kernel modules through soc= k_create_kern or sk_alloc(kern=3D0), it means that they must now hold sk_ne= t_refcnf, otherwise fin will only be sent once and will not be retransmitte= d when the socket is released.But other use tcp modules may not be aware of= hold sk_net_refcnt. should we add a check in tcp_close=EF=BC=9F --- diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index fb920369c..6b92026a4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2804,7 +2804,7 @@ void tcp_close(struct sock *sk, long timeout) lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); - if (!sk->sk_net_refcnt) + if (sk->net !=3D &init_net && !sk->sk_net_refcnt) inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); }