From nobody Fri Nov 29 02:58:57 2024 Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C6D81AD3FB; Wed, 25 Sep 2024 11:39:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.105.195 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727264401; cv=none; b=emIjq24RX7zHN2zXyOYxNLvzpIiXg+udLZ6fpPsNjV2jy6bHtnxTqkRUUFuaGy2SgtC901nsx1LNFCUK9B+7wQ/84P8FrW7aIWLQqWayElor5plH1ii2fAyZMiEg/uaLQoPm1wEes6xtXya6Yp+ngKgUJeOQC/2OmsdvxnGgrvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727264401; c=relaxed/simple; bh=7gRw46b0f+Sja2/CBpKqTi59zrLbvDkgMdNYSRtjEWg=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=KbloJpNnPAUL4rJAMwuu3aucxWqfs/kui3CE+yEfKKBAwYaaDcI5UaPLrvVcU/ceuYl/xCxcL+HuLCButGRFa6g+jvBboVscTvIOI//8AYVUzq19EQTyf1RijJbw4BwIVm3bzAXfQjaivld1i5mvjjP2qit9cEtyTHpQGJ8VY90= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=collabora.com; spf=pass smtp.mailfrom=collabora.com; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b=WbqMT3PS; arc=none smtp.client-ip=148.251.105.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=collabora.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=collabora.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="WbqMT3PS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1727264392; bh=7gRw46b0f+Sja2/CBpKqTi59zrLbvDkgMdNYSRtjEWg=; h=From:To:Cc:Subject:Date:From; b=WbqMT3PSVNr7sjvJErkS85CATCsNFDdZWnuVb02zWp5eiae9ZK6RX+4WAMUHUr2Al ACFkQHrkOsZveYXrNtUbhFegQQYhkTSDc7aY3HTzxJx1GtHwRNcAxwxx2ZUNBREQjN JmDz6v78jvhJXnlim/kB0RdocY7wAmbq7RanyH9qjpMSkf42+uG40KWDbOeWf2zyfB PesKwVL2lvqnFoTmZMdXRzVEcTIcDFvgd3oiDBYmZPzHSTvTaPg2EypOMyZ3pUibnz aj5d9xrJeIETV1hJM5trQtYhM6HVj5GQChT9xZm1Jxqn5Ni/K/3IDMedy5Gz8vdTNd ozKZA1IBWwFoQ== Received: from IcarusMOD.eternityproject.eu (2-237-20-237.ip236.fastwebnet.it [2.237.20.237]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: kholk11) by bali.collaboradmins.com (Postfix) with ESMTPSA id EDAF317E1270; Wed, 25 Sep 2024 13:39:51 +0200 (CEST) From: AngeloGioacchino Del Regno To: chaotian.jing@mediatek.com Cc: ulf.hansson@linaro.org, matthias.bgg@gmail.com, linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, kernel@collabora.com, AngeloGioacchino Del Regno Subject: [PATCH] mmc: mtk-sd: Implement Host Software Queue for eMMC and SD Card Date: Wed, 25 Sep 2024 13:39:49 +0200 Message-ID: <20240925113949.149655-1-angelogioacchino.delregno@collabora.com> X-Mailer: git-send-email 2.46.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for Host Software Queue (HSQ) and enable it when the controller instance does not have Command Queue Engine HW support. It was chosen to enable HSQ only for eMMC and SD/MicroSD cards and not for SDIO as performance improvements are seen only for the former. Performance was measured with a SanDisk Extreme Ultra A2 MicroSD card in a MediaTek MT8195T Acer Chromebook Spin 513 (CP513-2H), by running FIO (bs=3D4k) on an ArchLinux userspace. .... Summarizing .... Random read: +24.28% IOPS, +24.29% BW Sequential read: +3.14% IOPS, +3.49% BW Random RW (avg): +50.53% IOPS, +50.68% BW Below, more data from the benchmarks. Before: - Random read: IOPS=3D1643, BW=3D6574KiB/s bw ( KiB/s): min=3D 4578, max=3D 7440, per=3D99.95%, avg=3D6571.55, std= ev=3D74.16, samples=3D953 iops : min=3D 1144, max=3D 1860, avg=3D1642.14, stdev=3D18.54, sa= mples=3D953 lat (msec) : 100=3D0.01%, 250=3D0.12%, 500=3D0.38%, 750=3D97.89%, 1000= =3D1.44%, 2000=3D0.16% - Sequential read: IOPS=3D19.1k, BW=3D74.4MiB/s bw ( KiB/s): min=3D12288, max=3D118483, per=3D100.00%, avg=3D76293.38, = stdev=3D1971.42, samples=3D956 iops : min=3D 3072, max=3D29620, avg=3D19072.14, stdev=3D492.87, = samples=3D956 lat (msec) : 4=3D0.01%, 10=3D0.01%, 20=3D0.21%, 50=3D23.95%, 100=3D75.6= 7%, 250=3D0.05%, 500=3D0.03%, 750=3D0.08% - Random R/W: read: IOPS=3D282, BW=3D1129KiB/s (1156kB/s) write: IOPS=3D2= 84, BW=3D1136KiB/s read bw ( KiB/s): min=3D 31, max=3D 3496, per=3D100.00%, avg=3D1703.6= 7, stdev=3D155.42, samples=3D630 read iops : min=3D 7, max=3D 873, avg=3D425.22, stdev=3D38.85= , samples=3D630 wri bw ( KiB/s): min=3D 31, max=3D 3443, per=3D100.00%, avg=3D1674.2= 7, stdev=3D164.23, samples=3D644 wri iops : min=3D 7, max=3D 860, avg=3D417.87, stdev=3D41.03= , samples=3D644 lat (msec) : 250=3D0.13%, 500=3D0.44%, 750=3D0.84%, 1000=3D22.29%, 200= 0=3D74.01%, >=3D2000=3D2.30% After: - Random read: IOPS=3D2042, BW=3D8171KiB/s bw ( KiB/s): min=3D 4907, max=3D 9072, per=3D99.94%, avg=3D8166.80, std= ev=3D93.77, samples=3D954 iops : min=3D 1226, max=3D 2268, avg=3D2040.78, stdev=3D23.41, sa= mples=3D954 lat (msec) : 100=3D0.03%, 250=3D0.13%, 500=3D52.88%, 750=3D46.64%, 100= 0=3D0.32% - Sequential read: IOPS=3D19.7k, BW=3D77.0MiB/s bw ( KiB/s): min=3D67980, max=3D94248, per=3D100.00%, avg=3D78894.27, s= tdev=3D1475.07, samples=3D956 iops : min=3D16994, max=3D23562, avg=3D19722.45, stdev=3D368.76, = samples=3D956 lat (msec) : 4=3D0.01%, 10=3D0.01%, 20=3D0.05%, 50=3D28.78%, 100=3D71.= 14%, 250=3D0.01%, 500=3D0.02% - Random R/W: read: IOPS=3D424, BW=3D1699KiB/s write: IOPS=3D428, BW=3D17= 14KiB/s read bw ( KiB/s): min=3D 228, max=3D 2856, per=3D100.00%, avg=3D1796.6= 0, stdev=3D112.59, samples=3D901 read iops : min=3D 54, max=3D 712, avg=3D447.81, stdev=3D28.21= , samples=3D901 wri bw ( KiB/s): min=3D 28, max=3D 2904, per=3D100.00%, avg=3D1780.1= 1, stdev=3D128.27, samples=3D916 wri iops : min=3D 4, max=3D 724, avg=3D443.69, stdev=3D32.14= , samples=3D916 Signed-off-by: AngeloGioacchino Del Regno --- drivers/mmc/host/mtk-sd.c | 49 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index 5165a33bf74b..57ae5840696f 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -33,6 +33,7 @@ #include =20 #include "cqhci.h" +#include "mmc_hsq.h" =20 #define MAX_BD_NUM 1024 #define MSDC_NR_CLOCKS 3 @@ -473,6 +474,7 @@ struct msdc_host { bool hs400_tuning; /* hs400 mode online tuning */ bool internal_cd; /* Use internal card-detect logic */ bool cqhci; /* support eMMC hw cmdq */ + bool hsq_en; /* Host Software Queue is enabled */ struct msdc_save_para save_para; /* used when gate HCLK */ struct msdc_tune_para def_tune_para; /* default tune setting */ struct msdc_tune_para saved_tune_para; /* tune result of CMD21/CMD19 */ @@ -1170,7 +1172,9 @@ static void msdc_track_cmd_data(struct msdc_host *hos= t, struct mmc_command *cmd) =20 static void msdc_request_done(struct msdc_host *host, struct mmc_request *= mrq) { + struct mmc_host *mmc =3D mmc_from_priv(host); unsigned long flags; + bool hsq_req_done; =20 /* * No need check the return value of cancel_delayed_work, as only ONE @@ -1178,6 +1182,27 @@ static void msdc_request_done(struct msdc_host *host= , struct mmc_request *mrq) */ cancel_delayed_work(&host->req_timeout); =20 + /* + * If the request was handled from Host Software Queue, there's almost + * nothing to do here, and we also don't need to reset mrq as any race + * condition would not have any room to happen, since HSQ stores the + * "scheduled" mrqs in an internal array of mrq slots anyway. + * However, if the controller experienced an error, we still want to + * reset it as soon as possible. + * + * Note that non-HSQ requests will still be happening at times, even + * though it is enabled, and that's what is going to reset host->mrq. + * Also, msdc_unprepare_data() is going to be called by HSQ when needed + * as HSQ request finalization will eventually call the .post_req() + * callback of this driver which, in turn, unprepares the data. + */ + hsq_req_done =3D host->hsq_en ? mmc_hsq_finalize_request(mmc, mrq) : fals= e; + if (hsq_req_done) { + if (host->error) + msdc_reset_hw(host); + return; + } + spin_lock_irqsave(&host->lock, flags); host->mrq =3D NULL; spin_unlock_irqrestore(&host->lock, flags); @@ -1187,7 +1212,7 @@ static void msdc_request_done(struct msdc_host *host,= struct mmc_request *mrq) msdc_unprepare_data(host, mrq->data); if (host->error) msdc_reset_hw(host); - mmc_request_done(mmc_from_priv(host), mrq); + mmc_request_done(mmc, mrq); if (host->dev_comp->recheck_sdio_irq) msdc_recheck_sdio_irq(host); } @@ -1347,7 +1372,7 @@ static void msdc_ops_request(struct mmc_host *mmc, st= ruct mmc_request *mrq) struct msdc_host *host =3D mmc_priv(mmc); =20 host->error =3D 0; - WARN_ON(host->mrq); + WARN_ON(!host->hsq_en && host->mrq); host->mrq =3D mrq; =20 if (mrq->data) @@ -2916,6 +2941,19 @@ static int msdc_drv_probe(struct platform_device *pd= ev) mmc->max_seg_size =3D 64 * 1024; /* Reduce CIT to 0x40 that corresponds to 2.35us */ msdc_cqe_cit_cal(host, 2350); + } else if (mmc->caps2 & MMC_CAP2_NO_SDIO) { + /* Use HSQ on eMMC/SD (but not on SDIO) if HW CQE not supported */ + struct mmc_hsq *hsq =3D devm_kzalloc(&pdev->dev, sizeof(*hsq), GFP_KERNE= L); + if (!hsq) { + ret =3D -ENOMEM; + goto release; + } + + ret =3D mmc_hsq_init(hsq, mmc); + if (ret) + goto release; + + host->hsq_en =3D true; } =20 ret =3D devm_request_irq(&pdev->dev, host->irq, msdc_irq, @@ -3043,6 +3081,9 @@ static int __maybe_unused msdc_runtime_suspend(struct= device *dev) struct mmc_host *mmc =3D dev_get_drvdata(dev); struct msdc_host *host =3D mmc_priv(mmc); =20 + if (host->hsq_en) + mmc_hsq_suspend(mmc); + msdc_save_reg(host); =20 if (sdio_irq_claimed(mmc)) { @@ -3073,6 +3114,10 @@ static int __maybe_unused msdc_runtime_resume(struct= device *dev) pinctrl_select_state(host->pinctrl, host->pins_uhs); enable_irq(host->irq); } + + if (host->hsq_en) + mmc_hsq_resume(mmc); + return 0; } =20 --=20 2.46.1