From nobody Mon Nov 25 13:23:50 2024 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09171262A3; Sun, 27 Oct 2024 18:06:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730052406; cv=none; b=mdiR7Z3DbZ/FLJ7ZhhaDm85Mz5PCO0veplanmE1Qr116CfDGPxyj7leX4BTLRLdK4ZpIaEmkyyiHrpeaYWFREv/fhv1SmE7Gsh8nGjncXIPktZEhiPD6pus6+73qCm79z39U9vjXm8V8z4eqxAz2GUw/Y5ncnXvcgpw0XqZZfvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730052406; c=relaxed/simple; bh=4+0NulD4EEQ5BTETeq/86MPXoadMca0AwCvmywI7SSY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:To:CC; b=ORV6TTQi4VMoS9Lhzqooq2fvpiRp6tSkicRXf+KJ5puFpQXNfl5AuKCnEumtW5XYSKAQlkdDACjDBafG9Gyu56ZCMjVsm3Av3HulKEo34G3cKj3QWzF2F776xn01B3VP/L9HTN4LLL3qcJ5LBORLgyVxhsUiGjJ91adQUhy+Gd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=LPpfUFJV; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="LPpfUFJV" Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49RFo3l0012317; Sun, 27 Oct 2024 18:06:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:message-id :mime-version:subject:to; s=qcppdkim1; bh=5eLO2C1SYyEAk2YPDJc1FV TLIAHKQlZLKgQt46vMoFQ=; b=LPpfUFJVv+S6IScpi1lQX3XnL+7s5tXrI68zhm sROHoGVo+Y0cuYjTholK7OwguqX1lZqLCDZctFHognbFnvxt0DPKEIoDLoPX/WW+ fqHsddLeeDs1jSxQscSC0QhNxUcBcl3fnr51K2TbwbCTzi71XcCPUJsYh26fDTyB jmWdd8W3ShoyvJDdsk6U2yauT9q3U6tQPCYFw6YNIz17IwzJ0NYXYe/OagEjPSBQ Gbag3ttuDi7UtPLYC9ZB/wmDw1lta99hlH7MWjDVX1heTwDGGEHy8hzPyX9ZPedz tgRxSs5agRflJpu2NvoUF8wc8V/7sxZiUNMus1DQfQtpe4pA== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 42grt6tq4a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 27 Oct 2024 18:06:26 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA05.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 49RI6PKC030401 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 27 Oct 2024 18:06:25 GMT Received: from [10.213.111.143] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Sun, 27 Oct 2024 11:06:19 -0700 From: Akhil P Oommen Date: Sun, 27 Oct 2024 23:35:47 +0530 Subject: [PATCH] drm/msm/a6xx: Fix excessive stack usage Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: <20241027-stack-size-fix-v1-1-764e2e3566cb@quicinc.com> X-B4-Tracking: v=1; b=H4sIAPqAHmcC/yWMwQ6DIBAFf4XsuSQICOivNB4Al3bTqBVs09T47 yX18A4zyZsdCmbCAj3bIeObCi1zhebCIN79fENOY2WQQuqmjpfNxwcv9EWe6MOl88n6MKrkA9T TM2PV/+B1ODnj+qrd7ZQQfEEel2mirWdWR2W10Q5H4TAFiUmpzrQmSWt0lC4KbbrWwHAcPxT9J cSsAAAA To: Rob Clark , Sean Paul , "Konrad Dybcio" , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten , David Airlie , "Simona Vetter" , Nathan Chancellor , "Nick Desaulniers" , Bill Wendling , Justin Stitt CC: , , , , , Arnd Bergmann , Akhil P Oommen X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=ed25519-sha256; t=1730052379; l=3569; i=quic_akhilpo@quicinc.com; s=20240726; h=from:subject:message-id; bh=4+0NulD4EEQ5BTETeq/86MPXoadMca0AwCvmywI7SSY=; b=qJoax2Z2hTUQV4lstsski2oOlsORtBZY7vZ2fOfCUs/BduAPLP1/2H73fYix0YLuFj33ozm/G cFTvuw0eZryA4UyFQf0K0O+7Awdno/9IvNOe8aDjOTumkGmAqnKzNuQ X-Developer-Key: i=quic_akhilpo@quicinc.com; a=ed25519; pk=lmVtttSHmAUYFnJsQHX80IIRmYmXA4+CzpGcWOOsfKA= X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: VkbfgtwSZ3MAxdf5ovw3s8cmgryfRbg9 X-Proofpoint-GUID: VkbfgtwSZ3MAxdf5ovw3s8cmgryfRbg9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-06_09,2024-09-06_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 impostorscore=0 malwarescore=0 priorityscore=1501 phishscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410270159 Clang-19 and above sometimes end up with multiple copies of the large a6xx_hfi_msg_bw_table structure on the stack. The problem is that a6xx_hfi_send_bw_table() calls a number of device specific functions to fill the structure, but these create another copy of the structure on the stack which gets copied to the first. If the functions get inlined, that busts the warning limit: drivers/gpu/drm/msm/adreno/a6xx_hfi.c:631:12: error: stack frame size (1032= ) exceeds limit (1024) in 'a6xx_hfi_send_bw_table' [-Werror,-Wframe-larger-= than] Fix this by kmalloc-ating struct a6xx_hfi_msg_bw_table instead of using the stack. Also, use this opportunity to skip re-initializing this table to optimize gpu wake up latency. Cc: Arnd Bergmann Signed-off-by: Akhil P Oommen Reviewed-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 1 + drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 34 ++++++++++++++++++++++---------= --- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/ad= reno/a6xx_gmu.h index 94b6c5cab6f4..b4a79f88ccf4 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h @@ -99,6 +99,7 @@ struct a6xx_gmu { struct completion pd_gate; =20 struct qmp *qmp; + struct a6xx_hfi_msg_bw_table *bw_table; }; =20 static inline u32 gmu_read(struct a6xx_gmu *gmu, u32 offset) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/ad= reno/a6xx_hfi.c index cdb3f6e74d3e..55e51c81be1f 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c @@ -630,32 +630,42 @@ static void a6xx_build_bw_table(struct a6xx_hfi_msg_b= w_table *msg) =20 static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu) { - struct a6xx_hfi_msg_bw_table msg =3D { 0 }; + struct a6xx_hfi_msg_bw_table *msg; struct a6xx_gpu *a6xx_gpu =3D container_of(gmu, struct a6xx_gpu, gmu); struct adreno_gpu *adreno_gpu =3D &a6xx_gpu->base; =20 + if (gmu->bw_table) + goto send; + + msg =3D devm_kzalloc(gmu->dev, sizeof(*msg), GFP_KERNEL); + if (!msg) + return -ENOMEM; + if (adreno_is_a618(adreno_gpu)) - a618_build_bw_table(&msg); + a618_build_bw_table(msg); else if (adreno_is_a619(adreno_gpu)) - a619_build_bw_table(&msg); + a619_build_bw_table(msg); else if (adreno_is_a640_family(adreno_gpu)) - a640_build_bw_table(&msg); + a640_build_bw_table(msg); else if (adreno_is_a650(adreno_gpu)) - a650_build_bw_table(&msg); + a650_build_bw_table(msg); else if (adreno_is_7c3(adreno_gpu)) - adreno_7c3_build_bw_table(&msg); + adreno_7c3_build_bw_table(msg); else if (adreno_is_a660(adreno_gpu)) - a660_build_bw_table(&msg); + a660_build_bw_table(msg); else if (adreno_is_a690(adreno_gpu)) - a690_build_bw_table(&msg); + a690_build_bw_table(msg); else if (adreno_is_a730(adreno_gpu)) - a730_build_bw_table(&msg); + a730_build_bw_table(msg); else if (adreno_is_a740_family(adreno_gpu)) - a740_build_bw_table(&msg); + a740_build_bw_table(msg); else - a6xx_build_bw_table(&msg); + a6xx_build_bw_table(msg); + + gmu->bw_table =3D msg; =20 - return a6xx_hfi_send_msg(gmu, HFI_H2F_MSG_BW_TABLE, &msg, sizeof(msg), +send: + return a6xx_hfi_send_msg(gmu, HFI_H2F_MSG_BW_TABLE, gmu->bw_table, sizeof= (*(gmu->bw_table)), NULL, 0); } =20 --- base-commit: 74c374648ed08efb2ef339656f2764c28c046956 change-id: 20241024-stack-size-fix-28af7abd3fab Best regards, --=20 Akhil P Oommen