From nobody Mon Apr 6 22:52:07 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F482C433F5 for ; Fri, 7 Oct 2022 09:26:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229883AbiJGJ0y (ORCPT ); Fri, 7 Oct 2022 05:26:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229509AbiJGJ0j (ORCPT ); Fri, 7 Oct 2022 05:26:39 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7E50FDB7F for ; Fri, 7 Oct 2022 02:26:31 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id g1-20020a17090a708100b00203c1c66ae3so4253786pjk.2 for ; Fri, 07 Oct 2022 02:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uYMif+HZmQyjZO9+ZrtRzI8yE+UmHNvf6kDayj333yM=; b=IJgH2LJRAKKb3Zh5jNt1tdv/nL8M6YMHGQbDTkDIGagqKVMhOLD/EQibxcJFOfPI+3 GbKP/sce9KFkMWXWcDfh/ESrIJZx/jzodlD404R5hJsejmgEzuR/3zjTfPKeLNBczrU4 axaxhxJb/uFX+18fPTy6nCXooND3msW928+KF4A5yJF3OuLIrUzEhU7tI7eco84JoF/C FHL+Anb0rky8c9Qplzvp5fLo3k0PrURDAzw14f/HNyY2YURBKep2I2PDInOG32HD2WyE OqchF9KVt7YIe2GRV3hVbZ4YBO2G73VVkoromtq9ABElBld9jhmS5sgC2SWI7e3sRRBZ tITQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uYMif+HZmQyjZO9+ZrtRzI8yE+UmHNvf6kDayj333yM=; b=p3dq7ywQZzccz32qsQqfToU4+degcPhThC8PrCY8XJsvuWROGJWdquN6hELJKtyJnW Evb9HRFyXvhtfg2FcvXG2oxIjD1c3N6HRd3acRQ6cgvNMxJIonfRLhmu/l3bOgxeenOE 1bkHK2XnsBJzWmXHX7i/5NER1fURIy1myCVA+uKpBDDHPczY9NrpV0AUyrDH7MJV2ZHP fxama5+s0BNNS4TdpJXZJwCMdDWuptRksdeBShOtYnVTeJ+7clFOkbdd+OwcaAyZS44J LF8FsuZ0a2e8P8vLvYCq5wULCeDD8/Q5ER+auwCSfUIGXTlEM0RPFP1+UPpNM0EuXaKd oElA== X-Gm-Message-State: ACrzQf1QMv2HHm+1XtTdb1+dJsLvratl9iC+OUjdnamlxSsssVYfQAIG OATPnCHdTjRW5pHDUJADqiX1dQ== X-Google-Smtp-Source: AMsMyM6swTZXN/43XEZxeI+gDRXr7ugI1GutY/HGn3ncQzcLKyspExttOsyubAsVeWKiIv6W+JFUAw== X-Received: by 2002:a17:90b:38c9:b0:205:e2e2:9ea6 with SMTP id nn9-20020a17090b38c900b00205e2e29ea6mr15785006pjb.114.1665134790740; Fri, 07 Oct 2022 02:26:30 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id p7-20020a170902e74700b0016ef87334aesm1069394plf.162.2022.10.07.02.26.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Oct 2022 02:26:30 -0700 (PDT) From: Zhang Yuchen To: minyard@acm.org Cc: openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, qi.zheng@linux.dev, Zhang Yuchen Subject: [PATCH 1/3] ipmi: fix msg stack when IPMI is disconnected Date: Fri, 7 Oct 2022 17:26:15 +0800 Message-Id: <20221007092617.87597-2-zhangyuchen.lcr@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) In-Reply-To: <20221007092617.87597-1-zhangyuchen.lcr@bytedance.com> References: <20221007092617.87597-1-zhangyuchen.lcr@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If you continue to access and send messages at a high frequency (once every 55s) when the IPMI is disconnected, messages will accumulate in intf->[hp_]xmit_msg. If it lasts long enough, it takes up a lot of memory. The reason is that if IPMI is disconnected, each message will be set to IDLE after it returns to HOSED through IDLE->ERROR0->HOSED. The next message goes through the same process when it comes in. This process needs to wait for IBF_TIMEOUT * (MAX_ERROR_RETRIES + 1) =3D 55s. Each message takes 55S to destroy. This results in a continuous increase in memory. I find that if I wait 5 seconds after the first message fails, the status changes to ERROR0 in smi_timeout(). The next message will return the error code IPMI_NOT_IN_MY_STATE_ERR directly without wait. This is more in line with our needs. So instead of setting each message state to IDLE after it reaches the state HOSED, set state to ERROR0. After testing, the problem has been solved, no matter how many consecutive sends, will not cause continuous memory growth. It also returns to normal immediately after the IPMI is restored. The verification operations are as follows: 1. Use BPF to record the ipmi_alloc/free_smi_msg(). $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc %p\n",retval);} kprobe:free_recv_msg {printf("free %p\n",arg0)}' 2. Exec `date; time for x in $(seq 1 2); do ipmitool mc info; done`. 3. Record the output of `time` and when free all msgs. Before: `time` takes 120s, This is because `ipmitool mc info` send 4 msgs and waits only 15 seconds for each message. Last msg is free after 440s. $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc %p\n",retval);} kprobe:free_recv_msg {printf("free %p\n",arg0)}' Oct 05 11:40:55 Attaching 2 probes... Oct 05 11:41:12 alloc 0xffff9558a05f0c00 Oct 05 11:41:27 alloc 0xffff9558a05f1a00 Oct 05 11:41:42 alloc 0xffff9558a05f0000 Oct 05 11:41:57 alloc 0xffff9558a05f1400 Oct 05 11:42:07 free 0xffff9558a05f0c00 Oct 05 11:42:07 alloc 0xffff9558a05f7000 Oct 05 11:42:22 alloc 0xffff9558a05f2a00 Oct 05 11:42:37 alloc 0xffff9558a05f5a00 Oct 05 11:42:52 alloc 0xffff9558a05f3a00 Oct 05 11:43:02 free 0xffff9558a05f1a00 Oct 05 11:43:57 free 0xffff9558a05f0000 Oct 05 11:44:52 free 0xffff9558a05f1400 Oct 05 11:45:47 free 0xffff9558a05f7000 Oct 05 11:46:42 free 0xffff9558a05f2a00 Oct 05 11:47:37 free 0xffff9558a05f5a00 Oct 05 11:48:32 free 0xffff9558a05f3a00 $ root@dc00-pb003-t106-n078:~# date;time for x in $(seq 1 2); do ipmitool mc info; done Wed Oct 5 11:41:12 CST 2022 No data available Get Device ID command failed No data available No data available No valid response received Get Device ID command failed: Unspecified error No data available Get Device ID command failed No data available No data available No valid response received No data available Get Device ID command failed real 1m55.052s user 0m0.001s sys 0m0.001s After: `time` takes 55s, all msgs is returned and free after 55s. $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc %p\n",retval);} kprobe:free_recv_msg {printf("free %p\n",arg0)}' Oct 07 16:30:35 Attaching 2 probes... Oct 07 16:30:45 alloc 0xffff955943aa9800 Oct 07 16:31:00 alloc 0xffff955943aacc00 Oct 07 16:31:15 alloc 0xffff955943aa8c00 Oct 07 16:31:30 alloc 0xffff955943aaf600 Oct 07 16:31:40 free 0xffff955943aa9800 Oct 07 16:31:40 free 0xffff955943aacc00 Oct 07 16:31:40 free 0xffff955943aa8c00 Oct 07 16:31:40 free 0xffff955943aaf600 Oct 07 16:31:40 alloc 0xffff9558ec8f7e00 Oct 07 16:31:40 free 0xffff9558ec8f7e00 Oct 07 16:31:40 alloc 0xffff9558ec8f7800 Oct 07 16:31:40 free 0xffff9558ec8f7800 Oct 07 16:31:40 alloc 0xffff9558ec8f7e00 Oct 07 16:31:40 free 0xffff9558ec8f7e00 Oct 07 16:31:40 alloc 0xffff9558ec8f7800 Oct 07 16:31:40 free 0xffff9558ec8f7800 root@dc00-pb003-t106-n078:~# date;time for x in $(seq 1 2); do ipmitool mc info; done Fri Oct 7 16:30:45 CST 2022 No data available Get Device ID command failed No data available No data available No valid response received Get Device ID command failed: Unspecified error Get Device ID command failed: 0xd5 Command not supported in present state Get Device ID command failed: Command not supported in present state real 0m55.038s user 0m0.001s sys 0m0.001s Signed-off-by: Zhang Yuchen --- drivers/char/ipmi/ipmi_kcs_sm.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/char/ipmi/ipmi_kcs_sm.c b/drivers/char/ipmi/ipmi_kcs_s= m.c index efda90dcf5b3..e7f2cd10e5a6 100644 --- a/drivers/char/ipmi/ipmi_kcs_sm.c +++ b/drivers/char/ipmi/ipmi_kcs_sm.c @@ -122,10 +122,10 @@ struct si_sm_data { unsigned long error0_timeout; }; =20 -static unsigned int init_kcs_data(struct si_sm_data *kcs, - struct si_sm_io *io) +static unsigned int init_kcs_data_with_state(struct si_sm_data *kcs, + struct si_sm_io *io, enum kcs_states state) { - kcs->state =3D KCS_IDLE; + kcs->state =3D state; kcs->io =3D io; kcs->write_pos =3D 0; kcs->write_count =3D 0; @@ -140,6 +140,12 @@ static unsigned int init_kcs_data(struct si_sm_data *k= cs, return 2; } =20 +static unsigned int init_kcs_data(struct si_sm_data *kcs, + struct si_sm_io *io) +{ + return init_kcs_data_with_state(kcs, io, KCS_IDLE); +} + static inline unsigned char read_status(struct si_sm_data *kcs) { return kcs->io->inputb(kcs->io, 1); @@ -495,7 +501,7 @@ static enum si_sm_result kcs_event(struct si_sm_data *k= cs, long time) } =20 if (kcs->state =3D=3D KCS_HOSED) { - init_kcs_data(kcs, kcs->io); + init_kcs_data_with_state(kcs, kcs->io, KCS_ERROR0); return SI_SM_HOSED; } =20 --=20 2.30.2 From nobody Mon Apr 6 22:52:07 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54356C433FE for ; Fri, 7 Oct 2022 09:27:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229926AbiJGJ1I (ORCPT ); Fri, 7 Oct 2022 05:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbiJGJ0j (ORCPT ); Fri, 7 Oct 2022 05:26:39 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C4F9120BE8 for ; Fri, 7 Oct 2022 02:26:35 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id c24so4061957plo.3 for ; Fri, 07 Oct 2022 02:26:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kKFjny/9mWuO/iG0Kxc5VvaxquxrAamtdUptPDp0d74=; b=AngNmDbhqF7oOIzvRt23cm/VbXB5qly7NvGLYKWLcaYacG7ceVkIDVFCaTy+5Hw4N6 pkEKIsR+JnE7FQ2tKD6aIHN/lA3nSFzPpAfNkCXQpm3OV+MyvvfndWtGOSa6a1am65fY KAiPUjqJaqGmEM/cSmeU9guNI+hJxrxZ2hxb2Yoa9ZQTagAmfBHMVyjej00P9Sw1IRZ5 eH6/tgp28QATHvBUZ3ot8D5Auh1SR72Nw+rvbSuJMu9JUocNqeT75M4qsCBV7xsHLQ0B fHRwWAxh/9XMG9rKEwHTMsVvnMRa7uzIyzjaeQVGtjUlrMgntL3F+ORSdR/RQVIM1uGf 4+Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kKFjny/9mWuO/iG0Kxc5VvaxquxrAamtdUptPDp0d74=; b=rSu9QnZHrmFY9DvUB/B5e1VjMXA40VjhZstE3xbPwrwhQNCYIE9c+A7PLNXiHuFWMn el+1mXrbsRKpOK/J/GdOGsyqP7b1bxoZZEGS4pmjWUbtFxAdVylkRxhBQnF/o6Y0i7ID Wv+1oroWJ+yrBDxrticCdAiahLScR/yHJyPNYP9Zf5AQCqZrhSmA5tr3bpcbvgtAU38n TLk5UkcdlVKwjSnpXo+ER7oC4DphV5beO82PQQ8dhXgq5zZFos6V3Qed1BulwGiMiBYy fbXBgmeMhvAETm8A/zSCn7ULqNc63ykGiQYsXQI/fY+rzQV9HhIaeuy+QUFaz7RY+hBB OC9w== X-Gm-Message-State: ACrzQf3mpk88Ety6z5eAwIONnLstNMhL4/rCkPTr7hMqy4JJC33exbU+ +NeOzjKsZOBmcawUHsgiGUmicg== X-Google-Smtp-Source: AMsMyM4c5tdR5HoZonQcRDYA/eocBG1QDHyMCRm/da7cesjqViV750YP7uXXDUViKsc3DUfX8XnGhg== X-Received: by 2002:a17:90a:6503:b0:207:cd0e:bf1f with SMTP id i3-20020a17090a650300b00207cd0ebf1fmr4438832pjj.25.1665134793930; Fri, 07 Oct 2022 02:26:33 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id p7-20020a170902e74700b0016ef87334aesm1069394plf.162.2022.10.07.02.26.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Oct 2022 02:26:33 -0700 (PDT) From: Zhang Yuchen To: minyard@acm.org Cc: openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, qi.zheng@linux.dev, Zhang Yuchen Subject: [PATCH 2/3] ipmi: fix long wait in unload when IPMI disconnect Date: Fri, 7 Oct 2022 17:26:16 +0800 Message-Id: <20221007092617.87597-3-zhangyuchen.lcr@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) In-Reply-To: <20221007092617.87597-1-zhangyuchen.lcr@bytedance.com> References: <20221007092617.87597-1-zhangyuchen.lcr@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When fixing the problem mentioned in PATCH1, we also found the following problem: If the IPMI is disconnected and in the sending process, the uninstallation driver will be stuck for a long time. The main problem is that uninstalling the driver waits for curr_msg to be sent or HOSED. After stopping tasklet, the only place to trigger the timeout mechanism is the circular poll in shutdown_smi. The poll function delays 10us and calls smi_event_handler(smi_info,10). Smi_event_handler deducts 10us from kcs->ibf_timeout. But the poll func is followed by schedule_timeout_uninterruptible(1). The time consumed here is not counted in kcs->ibf_timeout. So when 10us is deducted from kcs->ibf_timeout, at least 1 jiffies has actually passed. The waiting time has increased by more than a hundredfold. Now instead of calling poll(). call smi_event_handler() directly and calculate the elapsed time. For verification, you can directly use ebpf to check the kcs-> ibf_timeout for each call to kcs_event() when IPMI is disconnected. Decrement at normal rate before unloading. The decrement rate becomes very slow after unloading. $ bpftrace -e 'kprobe:kcs_event {printf("kcs->ibftimeout : %d\n", *(arg0+584));}' Signed-off-by: Zhang Yuchen --- drivers/char/ipmi/ipmi_si_intf.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_i= ntf.c index 6e357ad76f2e..abddd7e43a9a 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2153,6 +2153,20 @@ static int __init init_ipmi_si(void) } module_init(init_ipmi_si); =20 +static void wait_msg_processed(struct smi_info *smi_info) +{ + unsigned long jiffies_now; + long time_diff; + + while (smi_info->curr_msg || (smi_info->si_state !=3D SI_NORMAL)) { + jiffies_now =3D jiffies; + time_diff =3D (((long)jiffies_now - (long)smi_info->last_timeout_jiffies) + * SI_USEC_PER_JIFFY); + smi_event_handler(smi_info, time_diff); + schedule_timeout_uninterruptible(1); + } +} + static void shutdown_smi(void *send_info) { struct smi_info *smi_info =3D send_info; @@ -2187,16 +2201,13 @@ static void shutdown_smi(void *send_info) * in the BMC. Note that timers and CPU interrupts are off, * so no need for locks. */ - while (smi_info->curr_msg || (smi_info->si_state !=3D SI_NORMAL)) { - poll(smi_info); - schedule_timeout_uninterruptible(1); - } + wait_msg_processed(smi_info); + if (smi_info->handlers) disable_si_irq(smi_info); - while (smi_info->curr_msg || (smi_info->si_state !=3D SI_NORMAL)) { - poll(smi_info); - schedule_timeout_uninterruptible(1); - } + + wait_msg_processed(smi_info); + if (smi_info->handlers) smi_info->handlers->cleanup(smi_info->si_sm); =20 --=20 2.30.2 From nobody Mon Apr 6 22:52:07 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB2FBC433FE for ; Fri, 7 Oct 2022 09:27:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbiJGJ1Q (ORCPT ); Fri, 7 Oct 2022 05:27:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229804AbiJGJ0m (ORCPT ); Fri, 7 Oct 2022 05:26:42 -0400 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75E56FDB7E for ; Fri, 7 Oct 2022 02:26:38 -0700 (PDT) Received: by mail-pj1-x102e.google.com with SMTP id 70so3935642pjo.4 for ; Fri, 07 Oct 2022 02:26:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zD3O1L1ojBhgR+j1WVoN6w4Vq4fXYzVbRBriwzwY8XQ=; b=OwW1Pnkg6++3R7dfzassb1BzGQNvxGiMWgBknPF6qEVCitKiSpSdB0a2z4A4hu0rJS o1Dxd4LzyKUxW1nfhBLLz3HgMz8PpMWbFXkgyS5o5qDZRhb2pmkR4WTxtNcstz831Cmm hs08O0AmhERjCNky7UutIISJB4vpnaYrYkw1Aad/E0SZSKSpUsHaXMK3YBuksVXxgiAt /uBWosUhzlupHlUKHu2oPRu3/vJ3DYkvzHepFyz8si4gbWQReryzXjMGnIIdG6pZPQw/ rjqEzhqrZ0mCfR3X+OpJoF82Jc0JOVx0R7BydFGH9k7iCmkaqP17g0TdaVp5cMOCwJ77 S1OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zD3O1L1ojBhgR+j1WVoN6w4Vq4fXYzVbRBriwzwY8XQ=; b=P+gSYr/6KKOSTqz+2sk1aK2IQMHT03Vwk6MttMrJB8zfmvQFmejLOXB9RZEE5+Lo6C HBD5o3p3e7QYSA1DPQCOPfsWCG642le3mpEjC1n2LMfXoI6i6J0wmZhBQtl3SSXw+EG/ ToDmGIMAdjKcyipqD8JNryj+OIQZdIIJv9tYyMoRTUITj4n3oKahAdVG9qmvkciKMNxF 5tkykGwPib0LwGrWJ5LqSsOFw7QCoDd0TM4uB5nSK3XZXga6GSfJ+71ITG07zA6XShHa S9d8kUSWPFdJ8MQ8PJZDnEEkGeLYYEjaqfM4PMXXyJ++rqAmT/zLR5zhGQclppBHM291 f2qw== X-Gm-Message-State: ACrzQf3P7usZo6R3CzEwUxeinUu+UW2c2q+MOygN4VsJL+railc1hxbl paqXZDWptKIlgJ1Lgp+ALcNQEw== X-Google-Smtp-Source: AMsMyM62HDWdtXV+hj6G0UxkKeuYhoJ/5th07sKDOtsbYnp8MUdIF1o9hB5IycxViMiliwGEcgof9w== X-Received: by 2002:a17:90b:1bc2:b0:200:a97b:4ae5 with SMTP id oa2-20020a17090b1bc200b00200a97b4ae5mr4483809pjb.147.1665134797159; Fri, 07 Oct 2022 02:26:37 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id p7-20020a170902e74700b0016ef87334aesm1069394plf.162.2022.10.07.02.26.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Oct 2022 02:26:36 -0700 (PDT) From: Zhang Yuchen To: minyard@acm.org Cc: openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, qi.zheng@linux.dev, Zhang Yuchen Subject: [PATCH 3/3] ipmi: fix memleak when unload ipmi driver Date: Fri, 7 Oct 2022 17:26:17 +0800 Message-Id: <20221007092617.87597-4-zhangyuchen.lcr@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) In-Reply-To: <20221007092617.87597-1-zhangyuchen.lcr@bytedance.com> References: <20221007092617.87597-1-zhangyuchen.lcr@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After the IPMI disconnect problem, the memory kept rising and we tried to unload the driver to free the memory. However, only part of the free memory is recovered after the driver is uninstalled. Using ebpf to hook free functions, we find that neither ipmi_user nor ipmi_smi_msg is free, only ipmi_recv_msg is free. We find that the deliver_smi_err_response call in clean_smi_msgs does the destroy processing on each message from the xmit_msg queue without checking the return value and free ipmi_smi_msg. deliver_smi_err_response is called only at this location. Adding the free handling has no effect. To verify, try using ebpf to trace the free function. $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc rcv %p\n",retval);} kprobe:free_recv_msg {printf("free recv %p\n", arg0)} kretprobe:ipmi_alloc_smi_msg {printf("alloc smi %p\n", retval);} kprobe:free_smi_msg {printf("free smi %p\n",arg0)}' Signed-off-by: Zhang Yuchen --- drivers/char/ipmi/ipmi_msghandler.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_m= sghandler.c index c8a3b208f923..7a7534046b5b 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -3710,12 +3710,15 @@ static void deliver_smi_err_response(struct ipmi_sm= i *intf, struct ipmi_smi_msg *msg, unsigned char err) { + int rv; msg->rsp[0] =3D msg->data[0] | 4; msg->rsp[1] =3D msg->data[1]; msg->rsp[2] =3D err; msg->rsp_size =3D 3; /* It's an error, so it will never requeue, no need to check return. */ - handle_one_recv_msg(intf, msg); + rv =3D handle_one_recv_msg(intf, msg); + if (rv =3D=3D 0) + ipmi_free_smi_msg(msg); } =20 static void cleanup_smi_msgs(struct ipmi_smi *intf) --=20 2.30.2