From nobody Mon Feb 9 12:25:33 2026 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92A012D0C82 for ; Fri, 19 Dec 2025 05:37:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766122663; cv=none; b=ivJWIVqFPxxqgjc1sLf8qKmwmkndrOLzoe0rzjgQt4Y2aQA0dUkfo+e5nG/Cr/mZWXSmzVQIxn1u1XyNSq0ujZ9SnWKrbPenainqm62YHd2hw/g563pxw/AfNIDG3njGzkvnONPNOagvtxvEF1bpCTNhi3YTk3Llz13CGW1LkrM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766122663; c=relaxed/simple; bh=6qIKRywSnWWq7hy1HRk8ynBQ/UoE1yw8azJnlduW0L0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pKmet+Ba+ADrFaMToUOV4bOgfZWKDCWTJ+lEsyLlqhoJzWubKiysnIe7dYExHYB2ieCBYSCyNLXJ5PmmL3B0RpkjA+YiuK4V8KNFDhaE+jhSX2ChWKXV8nPPbsERfkGOOUyq2E0ulJ1hvSo79Wv4XwCInInl7qTghZ33yUcQcFQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZjDVbAyl; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZjDVbAyl" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2a0834769f0so12925335ad.2 for ; Thu, 18 Dec 2025 21:37:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766122661; x=1766727461; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eenPutgBr0kvR5xknWuBlG56Mzp4JXW/11o/+DFN/Mk=; b=ZjDVbAylGFTevWR5ktOcGZLuQgJOROOpDsk36MO4dZi5sZcQG9l83JyCy+DKRnHN6m u9ugGagIvEknMvZoar3IVC8hl1W9YD/tBge69AxqqYNiQKr5p3WmOzi9+cWcd0kC/o3d bzAgLsM5PIY8nu01vuPZGFzBgWHpHVKewPjEuGlWaBHgCgv2mh3zltR5dk8gWqI2V5tG DZ1v/CDUi17UT/alN7r9YUOqEXIcBp1Q7TDQmmE57ZSApwM/wy3a6PJWpiO+xcwDVUbB yiwmq6YjgGQoeRRFm9tIl8CeCMB4I/f1ZCiIAXqmbSmI3oV20wj1pKyXI/2Om5Rs/smo 0LSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766122661; x=1766727461; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=eenPutgBr0kvR5xknWuBlG56Mzp4JXW/11o/+DFN/Mk=; b=r3g0cJyC6OLXvKuHUUYzj15G/JNWvtgyZo+qB8A8Za5WN3cfRco6OyfxQsfWkEwOUn 6aVXuIBVpwiKw3CYspvG7zhAsA0+6ymGgwwEPjeVxki9lslYwt8e1sjesWHWWAbdVD6+ MUaueV/MtluSvWBOnVkZW8584FLmIayHo907qo2wVrLoK6xH0ZBFWUdUcg+ZHsJPwZDs /Pj6L1ec0aG7FAlsWPGrWgflWle+Z2HjSgexkr8plt/fs0SZj+07kmKOA5ObpfgIkxfA 3jaq5FAOMlJFoRISQ6rTiryCNzbVWegBA3mcNwG7tcSnlGjg6l+h1H2sUdqBkiscZJFo vjwA== X-Forwarded-Encrypted: i=1; AJvYcCVwuQHxDhW+/qsobr+i1QclU6edy7+9W6Ap9hV6W2hopHWno8j6CQNoA/EzYoDlZWw1FLWgHFniprTOngI=@vger.kernel.org X-Gm-Message-State: AOJu0YxmHF2gUc82a+AXJ5+tucz8qaROa8v6tlGLW4pL5JrNMhLVJjLF JfwENqU8uUtkG1dGebQKPrJ2HUhsd5ND5K1VgvnLNGi/wvP7Hv/7awIf X-Gm-Gg: AY/fxX7TTb3J2oZpySRRnfxJ3LfxG8dPJJUt1byJpyrsZkGpLb6UnzGi9Wk6OPQAa2g gRQDQyl2onBryFMHoVGWYx4SDYe6xwE7pPEfPT9VbN8679bLOSXdw4tPE/kX2C7HBd3qzaWV2li YsdaQfMAa+VYXIdJiC+hQaEs32WS7/s1vXHD++Iwv95InVBp25Kl1lI7HXrdzMsXFWT4SRNiTn6 Dd2U6QaQp2mXjaVNeQ3H0YyXSAtrsvQQr9txZLK8Ew5sPQd0m4cUFu6QhBrHr3v5EqlkCNYHiff SlLmJ+PoIAXlFS0wof//EyQ37/fe0HZKJddeYiHEHP0q1vppK52s8Pu1ck6+5R6nHx+BrNB8p02 x0mQ/H0VGW58U5zKjPCcVA3Gr8q3oppTvEJFy13qBQQc/B6lpwSSAfwUbD+A2mM3ZBDBowLizg+ rWCO3/nbb59vBmobUxjkE= X-Google-Smtp-Source: AGHT+IHOj/tomowDEiB20GWpwm1Eafm2xs2aEyUzFu0xX3zTdDmt2EBoWVTLaYigoPnj18tyH/xwXQ== X-Received: by 2002:a17:902:d548:b0:2a1:325d:821a with SMTP id d9443c01a7336-2a2f2a4f0c5mr13846805ad.60.1766122660716; Thu, 18 Dec 2025 21:37:40 -0800 (PST) Received: from Barrys-MBP.hub ([47.72.129.29]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a2f3d4d895sm9930215ad.54.2025.12.18.21.37.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 18 Dec 2025 21:37:40 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: catalin.marinas@arm.com, m.szyprowski@samsung.com, robin.murphy@arm.com, will@kernel.org Cc: ada.coupriediaz@arm.com, anshuman.khandual@arm.com, ardb@kernel.org, iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, maz@kernel.org, ryan.roberts@arm.com, surenb@google.com, v-songbaohua@oppo.com, zhengtangquan@oppo.com Subject: [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch Date: Fri, 19 Dec 2025 13:36:57 +0800 Message-Id: <20251219053658.84978-6-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20251219053658.84978-1-21cnbao@gmail.com> References: <20251219053658.84978-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Barry Song This enables dma_direct_sync_sg_for_device, dma_direct_sync_sg_for_cpu, dma_direct_map_sg, and dma_direct_unmap_sg to use batched DMA sync operations when possible. This significantly improves performance on devices without hardware cache coherence. Tangquan's initial results show that batched synchronization can reduce dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK phone platform (MediaTek Dimensity 9500). The tests were performed by pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz, running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB sg entries per buffer) for 200 iterations and then averaging the results. Cc: Catalin Marinas Cc: Will Deacon Cc: Marek Szyprowski Cc: Robin Murphy Cc: Ada Couprie Diaz Cc: Ard Biesheuvel Cc: Marc Zyngier Cc: Anshuman Khandual Cc: Ryan Roberts Cc: Suren Baghdasaryan Cc: Tangquan Zheng Signed-off-by: Barry Song Reported-by: kernel test robot --- kernel/dma/direct.c | 28 ++++++++++----- kernel/dma/direct.h | 86 +++++++++++++++++++++++++++++++++++++++------ 2 files changed, 95 insertions(+), 19 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 50c3fe2a1d55..ed2339b0c5e7 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -403,9 +403,10 @@ void dma_direct_sync_sg_for_device(struct device *dev, swiotlb_sync_single_for_device(dev, paddr, sg->length, dir); =20 if (!dev_is_dma_coherent(dev)) - arch_sync_dma_for_device(paddr, sg->length, - dir); + arch_sync_dma_for_device_batch_add(paddr, sg->length, dir); } + if (!dev_is_dma_coherent(dev)) + arch_sync_dma_batch_flush(); } #endif =20 @@ -422,7 +423,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev, phys_addr_t paddr =3D dma_to_phys(dev, sg_dma_address(sg)); =20 if (!dev_is_dma_coherent(dev)) - arch_sync_dma_for_cpu(paddr, sg->length, dir); + arch_sync_dma_for_cpu_batch_add(paddr, sg->length, dir); =20 swiotlb_sync_single_for_cpu(dev, paddr, sg->length, dir); =20 @@ -430,8 +431,10 @@ void dma_direct_sync_sg_for_cpu(struct device *dev, arch_dma_mark_clean(paddr, sg->length); } =20 - if (!dev_is_dma_coherent(dev)) + if (!dev_is_dma_coherent(dev)) { arch_sync_dma_for_cpu_all(); + arch_sync_dma_batch_flush(); + } } =20 /* @@ -443,14 +446,19 @@ void dma_direct_unmap_sg(struct device *dev, struct s= catterlist *sgl, { struct scatterlist *sg; int i; + bool need_sync =3D false; =20 for_each_sg(sgl, sg, nents, i) { - if (sg_dma_is_bus_address(sg)) + if (sg_dma_is_bus_address(sg)) { sg_dma_unmark_bus_address(sg); - else - dma_direct_unmap_phys(dev, sg->dma_address, + } else { + need_sync =3D true; + dma_direct_unmap_phys_batch_add(dev, sg->dma_address, sg_dma_len(sg), dir, attrs); + } } + if (need_sync && !dev_is_dma_coherent(dev)) + arch_sync_dma_batch_flush(); } #endif =20 @@ -460,6 +468,7 @@ int dma_direct_map_sg(struct device *dev, struct scatte= rlist *sgl, int nents, struct pci_p2pdma_map_state p2pdma_state =3D {}; struct scatterlist *sg; int i, ret; + bool need_sync =3D false; =20 for_each_sg(sgl, sg, nents, i) { switch (pci_p2pdma_state(&p2pdma_state, dev, sg_page(sg))) { @@ -471,7 +480,8 @@ int dma_direct_map_sg(struct device *dev, struct scatte= rlist *sgl, int nents, */ break; case PCI_P2PDMA_MAP_NONE: - sg->dma_address =3D dma_direct_map_phys(dev, sg_phys(sg), + need_sync =3D true; + sg->dma_address =3D dma_direct_map_phys_batch_add(dev, sg_phys(sg), sg->length, dir, attrs); if (sg->dma_address =3D=3D DMA_MAPPING_ERROR) { ret =3D -EIO; @@ -491,6 +501,8 @@ int dma_direct_map_sg(struct device *dev, struct scatte= rlist *sgl, int nents, sg_dma_len(sg) =3D sg->length; } =20 + if (need_sync && !dev_is_dma_coherent(dev)) + arch_sync_dma_batch_flush(); return nents; =20 out_unmap: diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h index da2fadf45bcd..a211bab26478 100644 --- a/kernel/dma/direct.h +++ b/kernel/dma/direct.h @@ -64,15 +64,11 @@ static inline void dma_direct_sync_single_for_device(st= ruct device *dev, arch_sync_dma_for_device(paddr, size, dir); } =20 -static inline void dma_direct_sync_single_for_cpu(struct device *dev, - dma_addr_t addr, size_t size, enum dma_data_direction dir) +static inline void __dma_direct_sync_single_for_cpu(struct device *dev, + phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - phys_addr_t paddr =3D dma_to_phys(dev, addr); - - if (!dev_is_dma_coherent(dev)) { - arch_sync_dma_for_cpu(paddr, size, dir); + if (!dev_is_dma_coherent(dev)) arch_sync_dma_for_cpu_all(); - } =20 swiotlb_sync_single_for_cpu(dev, paddr, size, dir); =20 @@ -80,7 +76,31 @@ static inline void dma_direct_sync_single_for_cpu(struct= device *dev, arch_dma_mark_clean(paddr, size); } =20 -static inline dma_addr_t dma_direct_map_phys(struct device *dev, +#ifdef CONFIG_ARCH_WANT_BATCHED_DMA_SYNC +static inline void dma_direct_sync_single_for_cpu_batch_add(struct device = *dev, + dma_addr_t addr, size_t size, enum dma_data_direction dir) +{ + phys_addr_t paddr =3D dma_to_phys(dev, addr); + + if (!dev_is_dma_coherent(dev)) + arch_sync_dma_for_cpu_batch_add(paddr, size, dir); + + __dma_direct_sync_single_for_cpu(dev, paddr, size, dir); +} +#endif + +static inline void dma_direct_sync_single_for_cpu(struct device *dev, + dma_addr_t addr, size_t size, enum dma_data_direction dir) +{ + phys_addr_t paddr =3D dma_to_phys(dev, addr); + + if (!dev_is_dma_coherent(dev)) + arch_sync_dma_for_cpu(paddr, size, dir); + + __dma_direct_sync_single_for_cpu(dev, paddr, size, dir); +} + +static inline dma_addr_t __dma_direct_map_phys(struct device *dev, phys_addr_t phys, size_t size, enum dma_data_direction dir, unsigned long attrs) { @@ -108,9 +128,6 @@ static inline dma_addr_t dma_direct_map_phys(struct dev= ice *dev, } } =20 - if (!dev_is_dma_coherent(dev) && - !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO))) - arch_sync_dma_for_device(phys, size, dir); return dma_addr; =20 err_overflow: @@ -121,6 +138,53 @@ static inline dma_addr_t dma_direct_map_phys(struct de= vice *dev, return DMA_MAPPING_ERROR; } =20 +#ifdef CONFIG_ARCH_WANT_BATCHED_DMA_SYNC +static inline dma_addr_t dma_direct_map_phys_batch_add(struct device *dev, + phys_addr_t phys, size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + dma_addr_t dma_addr =3D __dma_direct_map_phys(dev, phys, size, dir, attrs= ); + + if (dma_addr !=3D DMA_MAPPING_ERROR && !dev_is_dma_coherent(dev) && + !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO))) + arch_sync_dma_for_device_batch_add(phys, size, dir); + + return dma_addr; +} +#endif + +static inline dma_addr_t dma_direct_map_phys(struct device *dev, + phys_addr_t phys, size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + dma_addr_t dma_addr =3D __dma_direct_map_phys(dev, phys, size, dir, attrs= ); + + if (dma_addr !=3D DMA_MAPPING_ERROR && !dev_is_dma_coherent(dev) && + !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO))) + arch_sync_dma_for_device(phys, size, dir); + + return dma_addr; +} + +#ifdef CONFIG_ARCH_WANT_BATCHED_DMA_SYNC +static inline void dma_direct_unmap_phys_batch_add(struct device *dev, dma= _addr_t addr, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + phys_addr_t phys; + + if (attrs & DMA_ATTR_MMIO) + /* nothing to do: uncached and no swiotlb */ + return; + + phys =3D dma_to_phys(dev, addr); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_direct_sync_single_for_cpu_batch_add(dev, addr, size, dir); + + swiotlb_tbl_unmap_single(dev, phys, size, dir, + attrs | DMA_ATTR_SKIP_CPU_SYNC); +} +#endif + static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t ad= dr, size_t size, enum dma_data_direction dir, unsigned long attrs) { --=20 2.39.3 (Apple Git-146)