[PATCH v6 3/3] tools/dma: Add dma_map_sg support

Qinxin Xia posted 3 patches 4 weeks ago
[PATCH v6 3/3] tools/dma: Add dma_map_sg support
Posted by Qinxin Xia 4 weeks ago
Support for dma_map_sg, add option '-m' to distinguish mode.

i) Users can set option '-m' to select mode:
   DMA_MAP_BENCH_SINGLE_MODE=0, DMA_MAP_BENCH_SG_MODE:=1
   (The mode is also show in the test result).
ii) Users can set option '-g' to set sg_nents
    (total count of entries in scatterlist)
    the maximum number is 1024. Each of sg buf size is PAGE_SIZE.
    e.g
    [root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
    dma mapping mode: DMA_MAP_BENCH_SG_MODE
    dma mapping benchmark: threads:8 seconds:30 node:-1
    dir:FROM_DEVICE granule/sg_nents: 8
    average map latency(us):1.4 standard deviation:0.3
    average unmap latency(us):1.3 standard deviation:0.3
    [root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
    dma mapping mode: DMA_MAP_BENCH_SINGLE_MODE
    dma mapping benchmark: threads:8 seconds:30 node:-1
    dir:FROM_DEVICE granule/sg_nents: 8
    average map latency(us):1.0 standard deviation:0.3
    average unmap latency(us):1.3 standard deviation:0.5

Reviewed-by: Barry Song <baohua@kernel.org>
Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com>
---
 tools/dma/dma_map_benchmark.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/tools/dma/dma_map_benchmark.c b/tools/dma/dma_map_benchmark.c
index dd0ed528e6df..143ca8dab8af 100644
--- a/tools/dma/dma_map_benchmark.c
+++ b/tools/dma/dma_map_benchmark.c
@@ -20,12 +20,19 @@ static char *directions[] = {
 	"FROM_DEVICE",
 };
 
+static char *mode[] = {
+	"SINGLE_MODE",
+	"SG_MODE",
+};
+
 int main(int argc, char **argv)
 {
 	struct map_benchmark map;
 	int fd, opt;
 	/* default single thread, run 20 seconds on NUMA_NO_NODE */
 	int threads = 1, seconds = 20, node = -1;
+	/* default single map mode */
+	int map_mode = DMA_MAP_BENCH_SINGLE_MODE;
 	/* default dma mask 32bit, bidirectional DMA */
 	int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL;
 	/* default granule 1 PAGESIZE */
@@ -33,7 +40,7 @@ int main(int argc, char **argv)
 
 	int cmd = DMA_MAP_BENCHMARK;
 
-	while ((opt = getopt(argc, argv, "t:s:n:b:d:x:g:")) != -1) {
+	while ((opt = getopt(argc, argv, "t:s:n:b:d:x:g:m:")) != -1) {
 		switch (opt) {
 		case 't':
 			threads = atoi(optarg);
@@ -56,11 +63,20 @@ int main(int argc, char **argv)
 		case 'g':
 			granule = atoi(optarg);
 			break;
+		case 'm':
+			map_mode = atoi(optarg);
+			break;
 		default:
 			return -1;
 		}
 	}
 
+	if (map_mode < 0 || map_mode >= DMA_MAP_BENCH_MODE_MAX) {
+		fprintf(stderr, "invalid map mode, SINGLE_MODE:%d, SG_MODE: %d\n",
+			DMA_MAP_BENCH_SINGLE_MODE, DMA_MAP_BENCH_SG_MODE);
+		exit(1);
+	}
+
 	if (threads <= 0 || threads > DMA_MAP_MAX_THREADS) {
 		fprintf(stderr, "invalid number of threads, must be in 1-%d\n",
 			DMA_MAP_MAX_THREADS);
@@ -110,14 +126,15 @@ int main(int argc, char **argv)
 	map.dma_dir = dir;
 	map.dma_trans_ns = xdelay;
 	map.granule = granule;
+	map.map_mode = map_mode;
 
 	if (ioctl(fd, cmd, &map)) {
 		perror("ioctl");
 		exit(1);
 	}
 
-	printf("dma mapping benchmark: threads:%d seconds:%d node:%d dir:%s granule: %d\n",
-			threads, seconds, node, directions[dir], granule);
+	printf("dma mapping benchmark(%s): threads:%d seconds:%d node:%d dir:%s granule:%d\n",
+			mode[map_mode], threads, seconds, node, directions[dir], granule);
 	printf("average map latency(us):%.1f standard deviation:%.1f\n",
 			map.avg_map_100ns/10.0, map.map_stddev/10.0);
 	printf("average unmap latency(us):%.1f standard deviation:%.1f\n",
-- 
2.33.0
Re: [PATCH v6 3/3] tools/dma: Add dma_map_sg support
Posted by Barry Song 2 weeks ago
On Mon, Jan 12, 2026 at 5:34 PM Qinxin Xia <xiaqinxin@huawei.com> wrote:
>
> Support for dma_map_sg, add option '-m' to distinguish mode.
>
> i) Users can set option '-m' to select mode:
>    DMA_MAP_BENCH_SINGLE_MODE=0, DMA_MAP_BENCH_SG_MODE:=1
>    (The mode is also show in the test result).
> ii) Users can set option '-g' to set sg_nents
>     (total count of entries in scatterlist)
>     the maximum number is 1024. Each of sg buf size is PAGE_SIZE.
>     e.g
>     [root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
>     dma mapping mode: DMA_MAP_BENCH_SG_MODE
>     dma mapping benchmark: threads:8 seconds:30 node:-1
>     dir:FROM_DEVICE granule/sg_nents: 8
>     average map latency(us):1.4 standard deviation:0.3
>     average unmap latency(us):1.3 standard deviation:0.3
>     [root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
>     dma mapping mode: DMA_MAP_BENCH_SINGLE_MODE
>     dma mapping benchmark: threads:8 seconds:30 node:-1
>     dir:FROM_DEVICE granule/sg_nents: 8
>     average map latency(us):1.0 standard deviation:0.3
>     average unmap latency(us):1.3 standard deviation:0.5
>

What happens if m is set to 0 while g is set to 8?

Thanks
Barry
Re: [PATCH v6 3/3] tools/dma: Add dma_map_sg support
Posted by Qinxin Xia 1 week, 3 days ago

On 2026/1/26 10:51:11, Barry Song <21cnbao@gmail.com> wrote:
> On Mon, Jan 12, 2026 at 5:34 PM Qinxin Xia <xiaqinxin@huawei.com> wrote:
>>
>> Support for dma_map_sg, add option '-m' to distinguish mode.
>>
>> i) Users can set option '-m' to select mode:
>>     DMA_MAP_BENCH_SINGLE_MODE=0, DMA_MAP_BENCH_SG_MODE:=1
>>     (The mode is also show in the test result).
>> ii) Users can set option '-g' to set sg_nents
>>      (total count of entries in scatterlist)
>>      the maximum number is 1024. Each of sg buf size is PAGE_SIZE.
>>      e.g
>>      [root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
>>      dma mapping mode: DMA_MAP_BENCH_SG_MODE
>>      dma mapping benchmark: threads:8 seconds:30 node:-1
>>      dir:FROM_DEVICE granule/sg_nents: 8
>>      average map latency(us):1.4 standard deviation:0.3
>>      average unmap latency(us):1.3 standard deviation:0.3
>>      [root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
>>      dma mapping mode: DMA_MAP_BENCH_SINGLE_MODE
>>      dma mapping benchmark: threads:8 seconds:30 node:-1
>>      dir:FROM_DEVICE granule/sg_nents: 8
>>      average map latency(us):1.0 standard deviation:0.3
>>      average unmap latency(us):1.3 standard deviation:0.5
>>
> 
> What happens if m is set to 0 while g is set to 8?
> 
> Thanks
> Barry

Hi Barry!
m set '0' and g set '8', This means that 8 page_sizes are mapped at a
time in single mode.
As the comment for the struct map_benchmark definition says:

  __u32 granule;  /* how many PAGE_SIZE will do map/unmap once a time */

[root@localhost xqx]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
dma mapping benchmark(SINGLE_MODE): threads:8 seconds:30 node:-1
dir:FROM_DEVICE granule:8
average map latency(us):0.2 standard deviation:0.1
average unmap latency(us):4.3 standard deviation:1.4

======================================================
The newly added sg mode reuses the -g option as sgnents and is described
in the comments:
        /*
         * Set the number of scatterlist entries based on the granule. 
  

         * In SG mode, 'granule' represents the number of scatterlist
entries.
         * Each scatterlist entry corresponds to a single page.
         */

By the way, I've considered testing sgnents of different sizes, but it's
not very easy to set for user parameters, so I set it with each
scatterlist entry corresponds to a single page.

Thanks,
Qinxin

Re: [PATCH v6 3/3] tools/dma: Add dma_map_sg support
Posted by Barry Song 1 week, 3 days ago
On Fri, Jan 30, 2026 at 4:38 PM Qinxin Xia <xiaqinxin@huawei.com> wrote:
>
>
>
> On 2026/1/26 10:51:11, Barry Song <21cnbao@gmail.com> wrote:
> > On Mon, Jan 12, 2026 at 5:34 PM Qinxin Xia <xiaqinxin@huawei.com> wrote:
> >>
> >> Support for dma_map_sg, add option '-m' to distinguish mode.
> >>
> >> i) Users can set option '-m' to select mode:
> >>     DMA_MAP_BENCH_SINGLE_MODE=0, DMA_MAP_BENCH_SG_MODE:=1
> >>     (The mode is also show in the test result).
> >> ii) Users can set option '-g' to set sg_nents
> >>      (total count of entries in scatterlist)
> >>      the maximum number is 1024. Each of sg buf size is PAGE_SIZE.
> >>      e.g
> >>      [root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
> >>      dma mapping mode: DMA_MAP_BENCH_SG_MODE
> >>      dma mapping benchmark: threads:8 seconds:30 node:-1
> >>      dir:FROM_DEVICE granule/sg_nents: 8
> >>      average map latency(us):1.4 standard deviation:0.3
> >>      average unmap latency(us):1.3 standard deviation:0.3
> >>      [root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
> >>      dma mapping mode: DMA_MAP_BENCH_SINGLE_MODE
> >>      dma mapping benchmark: threads:8 seconds:30 node:-1
> >>      dir:FROM_DEVICE granule/sg_nents: 8
> >>      average map latency(us):1.0 standard deviation:0.3
> >>      average unmap latency(us):1.3 standard deviation:0.5
> >>
> >
> > What happens if m is set to 0 while g is set to 8?
> >
> > Thanks
> > Barry
>
> Hi Barry!
> m set '0' and g set '8', This means that 8 page_sizes are mapped at a
> time in single mode.
> As the comment for the struct map_benchmark definition says:
>
>   __u32 granule;  /* how many PAGE_SIZE will do map/unmap once a time */
>
> [root@localhost xqx]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
> dma mapping benchmark(SINGLE_MODE): threads:8 seconds:30 node:-1
> dir:FROM_DEVICE granule:8
> average map latency(us):0.2 standard deviation:0.1
> average unmap latency(us):4.3 standard deviation:1.4
>
> ======================================================
> The newly added sg mode reuses the -g option as sgnents and is described
> in the comments:
>         /*
>          * Set the number of scatterlist entries based on the granule.
>
>
>          * In SG mode, 'granule' represents the number of scatterlist
> entries.
>          * Each scatterlist entry corresponds to a single page.
>          */
>
> By the way, I've considered testing sgnents of different sizes, but it's
> not very easy to set for user parameters, so I set it with each
> scatterlist entry corresponds to a single page.

This is a bit odd. Ideally, we shouldn’t have a mixed definition
for a single variant, but since this is just a tool, it may be
acceptable.

That said, the documentation should at least be updated in
patches 2/3 and 3/3. As it stands, it still says:

    __u32 granule; /* how many PAGE_SIZE are mapped or unmapped
                     at a time */


>
> Thanks,
> Qinxin
>
Re: [PATCH v6 3/3] tools/dma: Add dma_map_sg support
Posted by Qinxin Xia 6 days, 8 hours ago

On 2026/1/30 17:16:08, Barry Song <21cnbao@gmail.com> wrote:
> On Fri, Jan 30, 2026 at 4:38 PM Qinxin Xia <xiaqinxin@huawei.com> wrote:
>>
>>
>>
>> On 2026/1/26 10:51:11, Barry Song <21cnbao@gmail.com> wrote:
>>> On Mon, Jan 12, 2026 at 5:34 PM Qinxin Xia <xiaqinxin@huawei.com> wrote:
>>>>
>>>> Support for dma_map_sg, add option '-m' to distinguish mode.
>>>>
>>>> i) Users can set option '-m' to select mode:
>>>>      DMA_MAP_BENCH_SINGLE_MODE=0, DMA_MAP_BENCH_SG_MODE:=1
>>>>      (The mode is also show in the test result).
>>>> ii) Users can set option '-g' to set sg_nents
>>>>       (total count of entries in scatterlist)
>>>>       the maximum number is 1024. Each of sg buf size is PAGE_SIZE.
>>>>       e.g
>>>>       [root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
>>>>       dma mapping mode: DMA_MAP_BENCH_SG_MODE
>>>>       dma mapping benchmark: threads:8 seconds:30 node:-1
>>>>       dir:FROM_DEVICE granule/sg_nents: 8
>>>>       average map latency(us):1.4 standard deviation:0.3
>>>>       average unmap latency(us):1.3 standard deviation:0.3
>>>>       [root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
>>>>       dma mapping mode: DMA_MAP_BENCH_SINGLE_MODE
>>>>       dma mapping benchmark: threads:8 seconds:30 node:-1
>>>>       dir:FROM_DEVICE granule/sg_nents: 8
>>>>       average map latency(us):1.0 standard deviation:0.3
>>>>       average unmap latency(us):1.3 standard deviation:0.5
>>>>
>>>
>>> What happens if m is set to 0 while g is set to 8?
>>>
>>> Thanks
>>> Barry
>>
>> Hi Barry!
>> m set '0' and g set '8', This means that 8 page_sizes are mapped at a
>> time in single mode.
>> As the comment for the struct map_benchmark definition says:
>>
>>    __u32 granule;  /* how many PAGE_SIZE will do map/unmap once a time */
>>
>> [root@localhost xqx]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
>> dma mapping benchmark(SINGLE_MODE): threads:8 seconds:30 node:-1
>> dir:FROM_DEVICE granule:8
>> average map latency(us):0.2 standard deviation:0.1
>> average unmap latency(us):4.3 standard deviation:1.4
>>
>> ======================================================
>> The newly added sg mode reuses the -g option as sgnents and is described
>> in the comments:
>>          /*
>>           * Set the number of scatterlist entries based on the granule.
>>
>>
>>           * In SG mode, 'granule' represents the number of scatterlist
>> entries.
>>           * Each scatterlist entry corresponds to a single page.
>>           */
>>
>> By the way, I've considered testing sgnents of different sizes, but it's
>> not very easy to set for user parameters, so I set it with each
>> scatterlist entry corresponds to a single page.
> 
> This is a bit odd. Ideally, we shouldn’t have a mixed definition
> for a single variant, but since this is just a tool, it may be
> acceptable.
> 
> That said, the documentation should at least be updated in
> patches 2/3 and 3/3. As it stands, it still says:
> 
>      __u32 granule; /* how many PAGE_SIZE are mapped or unmapped
>                       at a time */
> 
> 
>>
>> Thanks,
>> Qinxin
>>
OK, I will update the documentation in the next version.
Do you have any other suggestions for this series?
-- 
Thanks,
Qinxin