From nobody Sun Feb 8 21:09:23 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4624DEB64DD for ; Wed, 9 Aug 2023 13:13:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232772AbjHINNp (ORCPT ); Wed, 9 Aug 2023 09:13:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232779AbjHINNm (ORCPT ); Wed, 9 Aug 2023 09:13:42 -0400 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2D11128 for ; Wed, 9 Aug 2023 06:13:36 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4RLVqW46RGz4f3jXT for ; Wed, 9 Aug 2023 21:13:27 +0800 (CST) Received: from huaweicloud.com (unknown [10.174.178.55]) by APP4 (Coremail) with SMTP id gCh0CgAXp6n1kNNkTII5AQ--.15907S5; Wed, 09 Aug 2023 21:13:29 +0800 (CST) From: thunder.leizhen@huaweicloud.com To: Will Deacon , Robin Murphy , Joerg Roedel , iommu@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Zhen Lei , Tanmay Jagdale , Jonathan Cameron Subject: [PATCH v2 1/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Date: Wed, 9 Aug 2023 21:13:02 +0800 Message-Id: <20230809131303.1355-2-thunder.leizhen@huaweicloud.com> X-Mailer: git-send-email 2.37.3.windows.1 In-Reply-To: <20230809131303.1355-1-thunder.leizhen@huaweicloud.com> References: <20230809131303.1355-1-thunder.leizhen@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXp6n1kNNkTII5AQ--.15907S5 X-Coremail-Antispam: 1UD129KBjvJXoW3KF4xXr1UGFWkGFy8WF4kWFg_yoWkKw4fpa 1DCas0yrn8tF1Sk348ZrsYvrnxK34Y9a40yrWUWa9xXw1jy34fXF1rKwn5tr97urW8KF1f Jr1jqFWUCrZrJrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9vb4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6r1S6rWUM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_JFI_Gr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lw4CEc2x0rVAKj4xxMxAIw28IcxkI7VAKI48JMxC20s 026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_ JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14 v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xva j40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JV W8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU88-BtUUUUU== X-CM-SenderInfo: hwkx0vthuozvpl2kv046kxt4xhlfz01xgou0bp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zhen Lei Ensure that each core exclusively occupies an ECMDQ and all of them are enabled during initialization. During this initialization process, any errors will result in a fallback to using normal CMDQ. When GERROR is triggered by ECMDQ, all ECMDQs need to be traversed: the ECMDQs with errors will be processed and the ECMDQs without errors will be skipped directly. Compared with register SMMU_CMDQ_PROD, register SMMU_ECMDQ_PROD has one more 'EN' bit and one more 'ERRACK' bit. After the error indicated by SMMU_GERROR.CMDQP_ERR is fixed, the 'ERRACK' bit needs to be toggled to resume the corresponding ECMDQ. In order to lockless protection against the write operation to bit 'ERRACK' during error handling and the read operation to bit 'ERRACK' during command insertion. Send IPI to the faulty CPU and perform the toggle operation on the faulty CPU. Because the command insertion is protected by local_irq_save(), so no race. Signed-off-by: Zhen Lei --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 219 +++++++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 33 +++ 2 files changed, 251 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/ar= m/arm-smmu-v3/arm-smmu-v3.c index 9b0dc35056019e0..c64b34be8eb9181 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -347,6 +347,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct ar= m_smmu_cmdq_ent *ent) =20 static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smm= u) { + if (smmu->ecmdq_enabled) { + struct arm_smmu_ecmdq *ecmdq; + + ecmdq =3D *this_cpu_ptr(smmu->ecmdqs); + + return &ecmdq->cmdq; + } + return &smmu->cmdq; } =20 @@ -429,6 +437,43 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_dev= ice *smmu) __arm_smmu_cmdq_skip_err(smmu, &smmu->cmdq.q); } =20 +static void arm_smmu_ecmdq_err_ack(void *info) +{ + u32 prod, cons; + struct arm_smmu_queue *q =3D info; + + prod =3D readl_relaxed(q->prod_reg); + cons =3D readl_relaxed(q->cons_reg); + prod &=3D ~ECMDQ_PROD_ERRACK; + prod |=3D cons & ECMDQ_CONS_ERR; + writel(prod, q->prod_reg); +} + +static void arm_smmu_ecmdq_skip_err(struct arm_smmu_device *smmu) +{ + int i; + u32 prod, cons; + struct arm_smmu_queue *q; + struct arm_smmu_ecmdq *ecmdq; + + if (!smmu->ecmdq_enabled) + return; + + for (i =3D 0; i < smmu->nr_ecmdq; i++) { + ecmdq =3D *per_cpu_ptr(smmu->ecmdqs, i); + q =3D &ecmdq->cmdq.q; + + prod =3D readl_relaxed(q->prod_reg); + cons =3D readl_relaxed(q->cons_reg); + if (((prod ^ cons) & ECMDQ_CONS_ERR) =3D=3D 0) + continue; + + __arm_smmu_cmdq_skip_err(smmu, q); + + smp_call_function_single(i, arm_smmu_ecmdq_err_ack, q, true); + } +} + /* * Command queue locking. * This is a form of bastardised rwlock with the following major changes: @@ -825,7 +870,10 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu= _device *smmu, * d. Advance the hardware prod pointer * Control dependency ordering from the entries becoming valid. */ - writel_relaxed(prod, cmdq->q.prod_reg); + if (smmu->ecmdq_enabled) + writel_relaxed(prod | ECMDQ_PROD_EN, cmdq->q.prod_reg); + else + writel_relaxed(prod, cmdq->q.prod_reg); =20 /* * e. Tell the next owner we're done @@ -1701,6 +1749,9 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, v= oid *dev) if (active & GERROR_CMDQ_ERR) arm_smmu_cmdq_skip_err(smmu); =20 + if (active & GERROR_CMDQP_ERR) + arm_smmu_ecmdq_skip_err(smmu); + writel(gerror, smmu->base + ARM_SMMU_GERRORN); return IRQ_HANDLED; } @@ -2957,6 +3008,20 @@ static int arm_smmu_cmdq_init(struct arm_smmu_device= *smmu) return 0; } =20 +static int arm_smmu_ecmdq_init(struct arm_smmu_cmdq *cmdq) +{ + unsigned int nents =3D 1 << cmdq->q.llq.max_n_shift; + + atomic_set(&cmdq->owner_prod, 0); + atomic_set(&cmdq->lock, 0); + + cmdq->valid_map =3D (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL); + if (!cmdq->valid_map) + return -ENOMEM; + + return 0; +} + static int arm_smmu_init_queues(struct arm_smmu_device *smmu) { int ret; @@ -3305,6 +3370,36 @@ static int arm_smmu_device_disable(struct arm_smmu_d= evice *smmu) return ret; } =20 +static void arm_smmu_ecmdq_reset(struct arm_smmu_device *smmu) +{ + u32 reg; + int i, ret; + struct arm_smmu_queue *q; + struct arm_smmu_ecmdq *ecmdq; + + if (!smmu->ecmdq_enabled) + return; + + for (i =3D 0; i < smmu->nr_ecmdq; i++) { + ecmdq =3D *per_cpu_ptr(smmu->ecmdqs, i); + + q =3D &ecmdq->cmdq.q; + writeq_relaxed(q->q_base, ecmdq->base + ARM_SMMU_ECMDQ_BASE); + writel_relaxed(q->llq.prod, ecmdq->base + ARM_SMMU_ECMDQ_PROD); + writel_relaxed(q->llq.cons, ecmdq->base + ARM_SMMU_ECMDQ_CONS); + + /* enable ecmdq */ + writel(ECMDQ_PROD_EN, q->prod_reg); + ret =3D readl_relaxed_poll_timeout(q->cons_reg, reg, reg & ECMDQ_CONS_EN= ACK, + 1, ARM_SMMU_POLL_TIMEOUT_US); + if (ret) { + dev_err(smmu->dev, "ecmdq[%d] enable failed\n", i); + smmu->ecmdq_enabled =3D false; + break; + } + } +} + static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) { int ret; @@ -3359,6 +3454,8 @@ static int arm_smmu_device_reset(struct arm_smmu_devi= ce *smmu, bool bypass) return ret; } =20 + arm_smmu_ecmdq_reset(smmu); + /* Invalidate any cached configuration */ cmd.opcode =3D CMDQ_OP_CFGI_ALL; arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); @@ -3476,6 +3573,112 @@ static void arm_smmu_device_iidr_probe(struct arm_s= mmu_device *smmu) } break; } +}; + +static int arm_smmu_ecmdq_layout(struct arm_smmu_device *smmu) +{ + int cpu; + struct arm_smmu_ecmdq __percpu *ecmdq; + + if (num_possible_cpus() <=3D smmu->nr_ecmdq) { + ecmdq =3D devm_alloc_percpu(smmu->dev, *ecmdq); + if (!ecmdq) + return -ENOMEM; + + for_each_possible_cpu(cpu) + *per_cpu_ptr(smmu->ecmdqs, cpu) =3D per_cpu_ptr(ecmdq, cpu); + + /* A core requires at most one ECMDQ */ + smmu->nr_ecmdq =3D num_possible_cpus(); + + return 0; + } + + return -ENOSPC; +} + +static int arm_smmu_ecmdq_probe(struct arm_smmu_device *smmu) +{ + int ret, cpu; + u32 i, nump, numq, gap; + u32 reg, shift_increment; + u64 offset; + void __iomem *cp_regs, *cp_base; + + /* IDR6 */ + reg =3D readl_relaxed(smmu->base + ARM_SMMU_IDR6); + nump =3D 1 << FIELD_GET(IDR6_LOG2NUMP, reg); + numq =3D 1 << FIELD_GET(IDR6_LOG2NUMQ, reg); + smmu->nr_ecmdq =3D nump * numq; + gap =3D ECMDQ_CP_RRESET_SIZE >> FIELD_GET(IDR6_LOG2NUMQ, reg); + + cp_regs =3D ioremap(smmu->iobase + ARM_SMMU_ECMDQ_CP_BASE, PAGE_SIZE); + if (!cp_regs) + return -ENOMEM; + + for (i =3D 0; i < nump; i++) { + u64 val, pre_addr =3D 0; + + val =3D readq_relaxed(cp_regs + 32 * i); + if (!(val & ECMDQ_CP_PRESET)) { + iounmap(cp_regs); + dev_err(smmu->dev, "ecmdq control page %u is memory mode\n", i); + return -EFAULT; + } + + if (i && ((val & ECMDQ_CP_ADDR) !=3D (pre_addr + ECMDQ_CP_RRESET_SIZE)))= { + iounmap(cp_regs); + dev_err(smmu->dev, "ecmdq_cp memory region is not contiguous\n"); + return -EFAULT; + } + + pre_addr =3D val & ECMDQ_CP_ADDR; + } + + offset =3D readl_relaxed(cp_regs) & ECMDQ_CP_ADDR; + iounmap(cp_regs); + + cp_base =3D devm_ioremap(smmu->dev, smmu->iobase + offset, ECMDQ_CP_RRESE= T_SIZE * nump); + if (!cp_base) + return -ENOMEM; + + smmu->ecmdqs =3D devm_alloc_percpu(smmu->dev, struct arm_smmu_ecmdq *); + if (!smmu->ecmdqs) + return -ENOMEM; + + ret =3D arm_smmu_ecmdq_layout(smmu); + if (ret) + return ret; + + shift_increment =3D order_base_2(num_possible_cpus() / smmu->nr_ecmdq); + + offset =3D 0; + for_each_possible_cpu(cpu) { + struct arm_smmu_ecmdq *ecmdq; + struct arm_smmu_queue *q; + + ecmdq =3D *per_cpu_ptr(smmu->ecmdqs, cpu); + ecmdq->base =3D cp_base + offset; + + q =3D &ecmdq->cmdq.q; + + q->llq.max_n_shift =3D ECMDQ_MAX_SZ_SHIFT + shift_increment; + ret =3D arm_smmu_init_one_queue(smmu, q, ecmdq->base, ARM_SMMU_ECMDQ_PRO= D, + ARM_SMMU_ECMDQ_CONS, CMDQ_ENT_DWORDS, "ecmdq"); + if (ret) + return ret; + + ret =3D arm_smmu_ecmdq_init(&ecmdq->cmdq); + if (ret) { + dev_err(smmu->dev, "ecmdq[%d] init failed\n", i); + return ret; + } + + offset +=3D gap; + } + smmu->ecmdq_enabled =3D true; + + return 0; } =20 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) @@ -3588,6 +3791,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_d= evice *smmu) return -ENXIO; } =20 + if (reg & IDR1_ECMDQ) + smmu->features |=3D ARM_SMMU_FEAT_ECMDQ; + /* Queue sizes, capped to ensure natural alignment */ smmu->cmdq.q.llq.max_n_shift =3D min_t(u32, CMDQ_MAX_SZ_SHIFT, FIELD_GET(IDR1_CMDQS, reg)); @@ -3695,6 +3901,16 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_= device *smmu) =20 dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n", smmu->ias, smmu->oas, smmu->features); + + if (smmu->features & ARM_SMMU_FEAT_ECMDQ) { + int err; + + err =3D arm_smmu_ecmdq_probe(smmu); + if (err) { + dev_err(smmu->dev, "suppress ecmdq feature, errno=3D%d\n", err); + smmu->ecmdq_enabled =3D false; + } + } return 0; } =20 @@ -3850,6 +4066,7 @@ static int arm_smmu_device_probe(struct platform_devi= ce *pdev) smmu->base =3D arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ); if (IS_ERR(smmu->base)) return PTR_ERR(smmu->base); + smmu->iobase =3D ioaddr; =20 if (arm_smmu_resource_size(smmu) > SZ_64K) { smmu->page1 =3D arm_smmu_ioremap(dev, ioaddr + SZ_64K, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/ar= m/arm-smmu-v3/arm-smmu-v3.h index dcab85698a4e257..0f01798f7c4e30d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -41,6 +41,7 @@ #define IDR0_S2P (1 << 0) =20 #define ARM_SMMU_IDR1 0x4 +#define IDR1_ECMDQ (1 << 31) #define IDR1_TABLES_PRESET (1 << 30) #define IDR1_QUEUES_PRESET (1 << 29) #define IDR1_REL (1 << 28) @@ -113,6 +114,7 @@ #define ARM_SMMU_IRQ_CTRLACK 0x54 =20 #define ARM_SMMU_GERROR 0x60 +#define GERROR_CMDQP_ERR (1 << 9) #define GERROR_SFM_ERR (1 << 8) #define GERROR_MSI_GERROR_ABT_ERR (1 << 7) #define GERROR_MSI_PRIQ_ABT_ERR (1 << 6) @@ -158,6 +160,26 @@ #define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8 #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc =20 +#define ARM_SMMU_IDR6 0x190 +#define IDR6_LOG2NUMP GENMASK(27, 24) +#define IDR6_LOG2NUMQ GENMASK(19, 16) +#define IDR6_BA_DOORBELLS GENMASK(9, 0) + +#define ARM_SMMU_ECMDQ_BASE 0x00 +#define ARM_SMMU_ECMDQ_PROD 0x08 +#define ARM_SMMU_ECMDQ_CONS 0x0c +#define ECMDQ_MAX_SZ_SHIFT 8 +#define ECMDQ_PROD_EN (1 << 31) +#define ECMDQ_CONS_ENACK (1 << 31) +#define ECMDQ_CONS_ERR (1 << 23) +#define ECMDQ_PROD_ERRACK (1 << 23) + +#define ARM_SMMU_ECMDQ_CP_BASE 0x4000 +#define ECMDQ_CP_ADDR GENMASK_ULL(51, 12) +#define ECMDQ_CP_CMDQGS GENMASK_ULL(2, 1) +#define ECMDQ_CP_PRESET (1UL << 0) +#define ECMDQ_CP_RRESET_SIZE 0x10000 + #define ARM_SMMU_REG_SZ 0xe00 =20 /* Common MSI config fields */ @@ -552,6 +574,11 @@ struct arm_smmu_cmdq { atomic_t lock; }; =20 +struct arm_smmu_ecmdq { + struct arm_smmu_cmdq cmdq; + void __iomem *base; +}; + struct arm_smmu_cmdq_batch { u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; int num; @@ -625,6 +652,7 @@ struct arm_smmu_device { struct device *dev; void __iomem *base; void __iomem *page1; + phys_addr_t iobase; =20 #define ARM_SMMU_FEAT_2_LVL_STRTAB (1 << 0) #define ARM_SMMU_FEAT_2_LVL_CDTAB (1 << 1) @@ -646,6 +674,7 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_SVA (1 << 17) #define ARM_SMMU_FEAT_E2H (1 << 18) #define ARM_SMMU_FEAT_NESTING (1 << 19) +#define ARM_SMMU_FEAT_ECMDQ (1 << 20) u32 features; =20 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) @@ -654,6 +683,10 @@ struct arm_smmu_device { #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3) u32 options; =20 + struct arm_smmu_ecmdq *__percpu *ecmdqs; + u32 nr_ecmdq; + bool ecmdq_enabled; + struct arm_smmu_cmdq cmdq; struct arm_smmu_evtq evtq; struct arm_smmu_priq priq; --=20 2.34.1 From nobody Sun Feb 8 21:09:23 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77FBBC41513 for ; Wed, 9 Aug 2023 13:13:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232748AbjHINNh (ORCPT ); Wed, 9 Aug 2023 09:13:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232710AbjHINNf (ORCPT ); Wed, 9 Aug 2023 09:13:35 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A9A52103 for ; Wed, 9 Aug 2023 06:13:34 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4RLVqX6BFYz4f3mHm for ; Wed, 9 Aug 2023 21:13:28 +0800 (CST) Received: from huaweicloud.com (unknown [10.174.178.55]) by APP4 (Coremail) with SMTP id gCh0CgAXp6n1kNNkTII5AQ--.15907S6; Wed, 09 Aug 2023 21:13:29 +0800 (CST) From: thunder.leizhen@huaweicloud.com To: Will Deacon , Robin Murphy , Joerg Roedel , iommu@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Zhen Lei , Tanmay Jagdale , Jonathan Cameron Subject: [PATCH v2 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ Date: Wed, 9 Aug 2023 21:13:03 +0800 Message-Id: <20230809131303.1355-3-thunder.leizhen@huaweicloud.com> X-Mailer: git-send-email 2.37.3.windows.1 In-Reply-To: <20230809131303.1355-1-thunder.leizhen@huaweicloud.com> References: <20230809131303.1355-1-thunder.leizhen@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXp6n1kNNkTII5AQ--.15907S6 X-Coremail-Antispam: 1UD129KBjvJXoWxCFyUKw1kur47uw4fWr4kZwb_yoW7JrWDpa 17u39YyF45XF1IkrW3Zr4Yqr9rXFyY9a4DtrWUWa4Dtr1UtryrWr1Fk3WFkrykurykZr4f Cr9Fqr4Uur1UtFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9vb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6r1S6rWUM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lw4CEc2x0rVAKj4xxMxAIw28IcxkI7VAKI48JMxC20s 026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_ JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14 v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xva j40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JV W8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU8y89tUUUUU== X-CM-SenderInfo: hwkx0vthuozvpl2kv046kxt4xhlfz01xgou0bp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zhen Lei The SYNC command only ensures that the command that precedes it in the same ECMDQ must be executed, but cannot synchronize the commands in other ECMDQs. If an unmap involves multiple commands, some commands are executed on one core, and the other commands are executed on another core. In this case, after the SYNC execution is complete, the execution of all preceded commands can not be ensured. Prevent the process that performs a set of associated commands insertion from being migrated to other cores ensures that all commands are inserted into the same ECMDQ. Signed-off-by: Zhen Lei --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++++++++++---- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/ar= m/arm-smmu-v3/arm-smmu-v3.c index c64b34be8eb9181..0244f0647745a72 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -243,6 +243,18 @@ static int queue_remove_raw(struct arm_smmu_queue *q, = u64 *ent) return 0; } =20 +static void arm_smmu_preempt_disable(struct arm_smmu_device *smmu) +{ + if (smmu->ecmdq_enabled) + preempt_disable(); +} + +static void arm_smmu_preempt_enable(struct arm_smmu_device *smmu) +{ + if (smmu->ecmdq_enabled) + preempt_enable(); +} + /* High-level queue accessors */ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) { @@ -1037,6 +1049,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *= smmu_domain, =20 cmds.num =3D 0; =20 + arm_smmu_preempt_disable(smmu); spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_for_each_entry(master, &smmu_domain->devices, domain_head) { for (i =3D 0; i < master->num_streams; i++) { @@ -1047,6 +1060,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *= smmu_domain, spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); =20 arm_smmu_cmdq_batch_submit(smmu, &cmds); + arm_smmu_preempt_enable(smmu); } =20 static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, @@ -1842,31 +1856,38 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iov= a, size_t size, =20 static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) { - int i; + int i, ret; struct arm_smmu_cmdq_ent cmd; struct arm_smmu_cmdq_batch cmds; + struct arm_smmu_device *smmu =3D master->smmu; =20 arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd); =20 cmds.num =3D 0; + + arm_smmu_preempt_disable(smmu); for (i =3D 0; i < master->num_streams; i++) { cmd.atc.sid =3D master->streams[i].id; - arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd); + arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd); } =20 - return arm_smmu_cmdq_batch_submit(master->smmu, &cmds); + ret =3D arm_smmu_cmdq_batch_submit(smmu, &cmds); + arm_smmu_preempt_enable(smmu); + + return ret; } =20 int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, unsigned long iova, size_t size) { - int i; + int i, ret; unsigned long flags; struct arm_smmu_cmdq_ent cmd; struct arm_smmu_master *master; struct arm_smmu_cmdq_batch cmds; + struct arm_smmu_device *smmu =3D smmu_domain->smmu; =20 - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) + if (!(smmu->features & ARM_SMMU_FEAT_ATS)) return 0; =20 /* @@ -1890,6 +1911,7 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *s= mmu_domain, int ssid, =20 cmds.num =3D 0; =20 + arm_smmu_preempt_disable(smmu); spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_for_each_entry(master, &smmu_domain->devices, domain_head) { if (!master->ats_enabled) @@ -1897,12 +1919,15 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain = *smmu_domain, int ssid, =20 for (i =3D 0; i < master->num_streams; i++) { cmd.atc.sid =3D master->streams[i].id; - arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd); + arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd); } } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); =20 - return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds); + ret =3D arm_smmu_cmdq_batch_submit(smmu, &cmds); + arm_smmu_preempt_enable(smmu); + + return ret; } =20 /* IO_PGTABLE API */ @@ -1962,6 +1987,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_= cmdq_ent *cmd, =20 cmds.num =3D 0; =20 + arm_smmu_preempt_disable(smmu); while (iova < end) { if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { /* @@ -1993,6 +2019,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_= cmdq_ent *cmd, iova +=3D inv_range; } arm_smmu_cmdq_batch_submit(smmu, &cmds); + arm_smmu_preempt_enable(smmu); } =20 static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, --=20 2.34.1