common/xfs | 3 ++- tests/nvme/004 | 2 +- tests/nvme/005 | 2 +- tests/nvme/006 | 2 +- tests/nvme/007 | 2 +- tests/nvme/008 | 2 +- tests/nvme/009 | 2 +- tests/nvme/010 | 5 +++-- tests/nvme/011 | 5 +++-- tests/nvme/012 | 4 ++-- tests/nvme/013 | 4 ++-- tests/nvme/014 | 10 ++++++++-- tests/nvme/015 | 10 ++++++++-- tests/nvme/017 | 2 +- tests/nvme/018 | 2 +- tests/nvme/019 | 2 +- tests/nvme/020 | 2 +- tests/nvme/021 | 2 +- tests/nvme/022 | 2 +- tests/nvme/023 | 2 +- tests/nvme/024 | 2 +- tests/nvme/025 | 2 +- tests/nvme/026 | 2 +- tests/nvme/027 | 2 +- tests/nvme/028 | 2 +- tests/nvme/029 | 2 +- tests/nvme/031 | 2 +- tests/nvme/032 | 4 ++-- tests/nvme/034 | 3 ++- tests/nvme/035 | 4 ++-- tests/nvme/040 | 4 ++-- tests/nvme/041 | 2 +- tests/nvme/042 | 2 +- tests/nvme/043 | 2 +- tests/nvme/044 | 2 +- tests/nvme/045 | 2 +- tests/nvme/047 | 2 +- tests/nvme/048 | 2 +- 38 files changed, 63 insertions(+), 47 deletions(-)
While testing the fc transport I got a bit tired of wait for the I/O jobs to
finish. Thus here some runtime optimization.
With a small/slow VM I got following values:
with 'optimizations'
loop:
real 4m43.981s
user 0m17.754s
sys 2m6.249s
rdma:
real 2m35.160s
user 0m6.264s
sys 0m56.230s
tcp:
real 2m30.391s
user 0m5.770s
sys 0m46.007s
fc:
real 2m19.738s
user 0m6.012s
sys 0m42.201s
base:
loop:
real 7m35.061s
user 0m23.493s
sys 2m54.866s
rdma:
real 8m29.347s
user 0m13.078s
sys 1m53.158s
tcp:
real 8m11.357s
user 0m13.033s
sys 2m43.156s
fc:
real 5m46.615s
user 0m12.819s
sys 1m46.338s
Daniel Wagner (1):
nvme: Limit runtime for verification and limit test image size
common/xfs | 3 ++-
tests/nvme/004 | 2 +-
tests/nvme/005 | 2 +-
tests/nvme/006 | 2 +-
tests/nvme/007 | 2 +-
tests/nvme/008 | 2 +-
tests/nvme/009 | 2 +-
tests/nvme/010 | 5 +++--
tests/nvme/011 | 5 +++--
tests/nvme/012 | 4 ++--
tests/nvme/013 | 4 ++--
tests/nvme/014 | 10 ++++++++--
tests/nvme/015 | 10 ++++++++--
tests/nvme/017 | 2 +-
tests/nvme/018 | 2 +-
tests/nvme/019 | 2 +-
tests/nvme/020 | 2 +-
tests/nvme/021 | 2 +-
tests/nvme/022 | 2 +-
tests/nvme/023 | 2 +-
tests/nvme/024 | 2 +-
tests/nvme/025 | 2 +-
tests/nvme/026 | 2 +-
tests/nvme/027 | 2 +-
tests/nvme/028 | 2 +-
tests/nvme/029 | 2 +-
tests/nvme/031 | 2 +-
tests/nvme/032 | 4 ++--
tests/nvme/034 | 3 ++-
tests/nvme/035 | 4 ++--
tests/nvme/040 | 4 ++--
tests/nvme/041 | 2 +-
tests/nvme/042 | 2 +-
tests/nvme/043 | 2 +-
tests/nvme/044 | 2 +-
tests/nvme/045 | 2 +-
tests/nvme/047 | 2 +-
tests/nvme/048 | 2 +-
38 files changed, 63 insertions(+), 47 deletions(-)
--
2.40.0
On 4/19/23 01:56, Daniel Wagner wrote: > While testing the fc transport I got a bit tired of wait for the I/O jobs to > finish. Thus here some runtime optimization. > > With a small/slow VM I got following values: > > with 'optimizations' > loop: > real 4m43.981s > user 0m17.754s > sys 2m6.249s > > rdma: > real 2m35.160s > user 0m6.264s > sys 0m56.230s > > tcp: > real 2m30.391s > user 0m5.770s > sys 0m46.007s > > fc: > real 2m19.738s > user 0m6.012s > sys 0m42.201s > > base: > loop: > real 7m35.061s > user 0m23.493s > sys 2m54.866s > > rdma: > real 8m29.347s > user 0m13.078s > sys 1m53.158s > > tcp: > real 8m11.357s > user 0m13.033s > sys 2m43.156s > > fc: > real 5m46.615s > user 0m12.819s > sys 1m46.338s > > Those jobs are meant to be run for at least 1G to establish confidence on the data set and the system under test since SSDs are in TBs nowadays and we don't even get anywhere close to that, with your suggestion we are going even lower ... we cannot change the dataset size for slow VMs, instead add a command line argument and pass it to tests e.g. nvme_verification_size=XXX similar to nvme_trtype but don't change the default values which we have been testing for years now Testing is supposed to be time consuming especially verification jobs.. -ck
>> While testing the fc transport I got a bit tired of wait for the I/O jobs to >> finish. Thus here some runtime optimization. >> >> With a small/slow VM I got following values: >> >> with 'optimizations' >> loop: >> real 4m43.981s >> user 0m17.754s >> sys 2m6.249s How come loop is doubling the time with this patch? ratio is not the same before and after. >> >> rdma: >> real 2m35.160s >> user 0m6.264s >> sys 0m56.230s >> >> tcp: >> real 2m30.391s >> user 0m5.770s >> sys 0m46.007s >> >> fc: >> real 2m19.738s >> user 0m6.012s >> sys 0m42.201s >> >> base: >> loop: >> real 7m35.061s >> user 0m23.493s >> sys 2m54.866s >> >> rdma: >> real 8m29.347s >> user 0m13.078s >> sys 1m53.158s >> >> tcp: >> real 8m11.357s >> user 0m13.033s >> sys 2m43.156s >> >> fc: >> real 5m46.615s >> user 0m12.819s >> sys 1m46.338s >> >> > > Those jobs are meant to be run for at least 1G to establish > confidence on the data set and the system under test since SSDs > are in TBs nowadays and we don't even get anywhere close to that, > with your suggestion we are going even lower ... Where does the 1G boundary coming from? > we cannot change the dataset size for slow VMs, instead add > a command line argument and pass it to tests e.g. > nvme_verification_size=XXX similar to nvme_trtype but don't change > the default values which we have been testing for years now > > Testing is supposed to be time consuming especially verification jobs.. I like the idea, but I think it may need to be the other way around. Have shortest possible runs by default.
we cannot change the dataset size for slow VMs, instead add >> a command line argument and pass it to tests e.g. >> nvme_verification_size=XXX similar to nvme_trtype but don't change >> the default values which we have been testing for years now >> >> Testing is supposed to be time consuming especially verification jobs.. > > I like the idea, but I think it may need to be the other way around. > Have shortest possible runs by default. not everyone is running blktests on the slow vms, so I think it should be the other way around, the default integration of these testcases using 1G size in various distros, and it is not a good idea to change that so everyone else who are not running slow vms who should update their testscripts ... -ck
On 4/19/23 02:50, Sagi Grimberg wrote: > >>> While testing the fc transport I got a bit tired of wait for the I/O >>> jobs to >>> finish. Thus here some runtime optimization. >>> >>> With a small/slow VM I got following values: >>> >>> with 'optimizations' >>> loop: >>> real 4m43.981s >>> user 0m17.754s >>> sys 2m6.249s > > How come loop is doubling the time with this patch? > ratio is not the same before and after. > >>> >>> rdma: >>> real 2m35.160s >>> user 0m6.264s >>> sys 0m56.230s >>> >>> tcp: >>> real 2m30.391s >>> user 0m5.770s >>> sys 0m46.007s >>> >>> fc: >>> real 2m19.738s >>> user 0m6.012s >>> sys 0m42.201s >>> >>> base: >>> loop: >>> real 7m35.061s >>> user 0m23.493s >>> sys 2m54.866s >>> >>> rdma: >>> real 8m29.347s >>> user 0m13.078s >>> sys 1m53.158s >>> >>> tcp: >>> real 8m11.357s >>> user 0m13.033s >>> sys 2m43.156s >>> >>> fc: >>> real 5m46.615s >>> user 0m12.819s >>> sys 1m46.338s >>> >>> >> >> Those jobs are meant to be run for at least 1G to establish >> confidence on the data set and the system under test since SSDs >> are in TBs nowadays and we don't even get anywhere close to that, >> with your suggestion we are going even lower ... > > Where does the 1G boundary coming from? > I wrote these testcases 3 times, initially they were the part of nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they moved to blktests. In that time some of the testcases would not fail on with small size such as less than 512MB especially with verification but they were in the errors with 1G Hence I kept to be 1G. Now I don't remember why I didn't use bigger size than 1G should have documented that somewhere ... >> we cannot change the dataset size for slow VMs, instead add >> a command line argument and pass it to tests e.g. >> nvme_verification_size=XXX similar to nvme_trtype but don't change >> the default values which we have been testing for years now >> >> Testing is supposed to be time consuming especially verification jobs.. > > I like the idea, but I think it may need to be the other way around. > Have shortest possible runs by default. see above.. -ck
On Wed, Apr 19, 2023 at 09:11:33PM +0000, Chaitanya Kulkarni wrote: > >> Those jobs are meant to be run for at least 1G to establish > >> confidence on the data set and the system under test since SSDs > >> are in TBs nowadays and we don't even get anywhere close to that, > >> with your suggestion we are going even lower ... > > > > Where does the 1G boundary coming from? > > > > I wrote these testcases 3 times, initially they were the part of > nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they > moved to blktests. > > In that time some of the testcases would not fail on with small size > such as less than 512MB especially with verification but they were > in the errors with 1G Hence I kept to be 1G. > > Now I don't remember why I didn't use bigger size than 1G > should have documented that somewhere ... Can you remember why you chosed to set the image size to 1G and the io size for fio to 950m in nvme/012 and nvme/013? I am testing various image sizes and found that small images e.g in the range of [4..64]m are passing fine but larger ones like [512-...]M do not (no space left). Note I've added a calc function which does image size - 1M to leave some room left.
On Thu, Apr 20, 2023 at 10:24:15AM +0200, Daniel Wagner wrote:
> On Wed, Apr 19, 2023 at 09:11:33PM +0000, Chaitanya Kulkarni wrote:
> > >> Those jobs are meant to be run for at least 1G to establish
> > >> confidence on the data set and the system under test since SSDs
> > >> are in TBs nowadays and we don't even get anywhere close to that,
> > >> with your suggestion we are going even lower ...
> > >
> > > Where does the 1G boundary coming from?
> > >
> >
> > I wrote these testcases 3 times, initially they were the part of
> > nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they
> > moved to blktests.
> >
> > In that time some of the testcases would not fail on with small size
> > such as less than 512MB especially with verification but they were
> > in the errors with 1G Hence I kept to be 1G.
> >
> > Now I don't remember why I didn't use bigger size than 1G
> > should have documented that somewhere ...
>
> Can you remember why you chosed to set the image size to 1G and the io size for
> fio to 950m in nvme/012 and nvme/013?
forget it, found a commit message which explains it
e5bd71872b3b ("nvme/012,013,035: change fio I/O size and move size definition place")
[...]
Change fio I/O size of nvme/012,013,035 from 950m to 900m, since recent change
increased the xfs log size and it caused fio failure with I/O size 950m.
On Wed, Apr 19, 2023 at 12:50:10PM +0300, Sagi Grimberg wrote:
>
> > > While testing the fc transport I got a bit tired of wait for the I/O jobs to
> > > finish. Thus here some runtime optimization.
> > >
> > > With a small/slow VM I got following values:
> > >
> > > with 'optimizations'
> > > loop:
> > > real 4m43.981s
> > > user 0m17.754s
> > > sys 2m6.249s
>
> How come loop is doubling the time with this patch?
> ratio is not the same before and after.
first run was with loop, second one with rdma:
nvme/002 (create many subsystems and test discovery) [not run]
runtime 82.089s ...
nvme_trtype=rdma is not supported in this test
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
runtime 39.948s ...
nvme_trtype=rdma is not supported in this test
nvme/017 (create/delete many file-ns and test discovery) [not run]
runtime 40.237s ...
nvme/047 (test different queue types for fabric transports) [passed]
runtime ... 13.580s
nvme/048 (Test queue count changes on reconnect) [passed]
runtime ... 6.287s
82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, though my
optimization didn't work there...
> > Those jobs are meant to be run for at least 1G to establish
> > confidence on the data set and the system under test since SSDs
> > are in TBs nowadays and we don't even get anywhere close to that,
> > with your suggestion we are going even lower ...
>
> Where does the 1G boundary coming from?
No idea, it just the existing hard coded values. I guess it might be from
efa06fcf3c83 ("loop: test partition scanning") which was the first real test
case (according the logs).
> > we cannot change the dataset size for slow VMs, instead add
> > a command line argument and pass it to tests e.g.
> > nvme_verification_size=XXX similar to nvme_trtype but don't change
> > the default values which we have been testing for years now
> >
> > Testing is supposed to be time consuming especially verification jobs..
>
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.
Good point, I'll make it configurable. What is a good small default then? There
are some test cases in loop which allocated a 1M file. That's propably too
small.
>>>> While testing the fc transport I got a bit tired of wait for the I/O jobs to
>>>> finish. Thus here some runtime optimization.
>>>>
>>>> With a small/slow VM I got following values:
>>>>
>>>> with 'optimizations'
>>>> loop:
>>>> real 4m43.981s
>>>> user 0m17.754s
>>>> sys 2m6.249s
>>
>> How come loop is doubling the time with this patch?
>> ratio is not the same before and after.
>
> first run was with loop, second one with rdma:
>
> nvme/002 (create many subsystems and test discovery) [not run]
> runtime 82.089s ...
> nvme_trtype=rdma is not supported in this test
>
> nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
> runtime 39.948s ...
> nvme_trtype=rdma is not supported in this test
> nvme/017 (create/delete many file-ns and test discovery) [not run]
> runtime 40.237s ...
>
> nvme/047 (test different queue types for fabric transports) [passed]
> runtime ... 13.580s
> nvme/048 (Test queue count changes on reconnect) [passed]
> runtime ... 6.287s
>
> 82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, though my
> optimization didn't work there...
How come loop is 4m+ while the others are 2m+ when before all
were the same timeframe more or less?
>
>>> Those jobs are meant to be run for at least 1G to establish
>>> confidence on the data set and the system under test since SSDs
>>> are in TBs nowadays and we don't even get anywhere close to that,
>>> with your suggestion we are going even lower ...
>>
>> Where does the 1G boundary coming from?
>
> No idea, it just the existing hard coded values. I guess it might be from
> efa06fcf3c83 ("loop: test partition scanning") which was the first real test
> case (according the logs).
Was asking Chaitanya why is 1G considered sufficient vs. other sizes?
Why not 10G? Why not 100M?
On 4/19/23 06:15, Sagi Grimberg wrote:
>
>>>>> While testing the fc transport I got a bit tired of wait for the
>>>>> I/O jobs to
>>>>> finish. Thus here some runtime optimization.
>>>>>
>>>>> With a small/slow VM I got following values:
>>>>>
>>>>> with 'optimizations'
>>>>> loop:
>>>>> real 4m43.981s
>>>>> user 0m17.754s
>>>>> sys 2m6.249s
>>>
>>> How come loop is doubling the time with this patch?
>>> ratio is not the same before and after.
>>
>> first run was with loop, second one with rdma:
>>
>> nvme/002 (create many subsystems and test discovery) [not run]
>> runtime 82.089s ...
>> nvme_trtype=rdma is not supported in this test
>>
>> nvme/016 (create/delete many NVMeOF block device-backed ns and test
>> discovery) [not run]
>> runtime 39.948s ...
>> nvme_trtype=rdma is not supported in this test
>> nvme/017 (create/delete many file-ns and test discovery) [not run]
>> runtime 40.237s ...
>>
>> nvme/047 (test different queue types for fabric transports) [passed]
>> runtime ... 13.580s
>> nvme/048 (Test queue count changes on reconnect) [passed]
>> runtime ... 6.287s
>>
>> 82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm,
>> though my
>> optimization didn't work there...
>
> How come loop is 4m+ while the others are 2m+ when before all
> were the same timeframe more or less?
>
>>
>>>> Those jobs are meant to be run for at least 1G to establish
>>>> confidence on the data set and the system under test since SSDs
>>>> are in TBs nowadays and we don't even get anywhere close to that,
>>>> with your suggestion we are going even lower ...
>>>
>>> Where does the 1G boundary coming from?
>>
>> No idea, it just the existing hard coded values. I guess it might be
>> from
>> efa06fcf3c83 ("loop: test partition scanning") which was the first
>> real test
>> case (according the logs).
>
> Was asking Chaitanya why is 1G considered sufficient vs. other sizes?
> Why not 10G? Why not 100M?
See the earlier response ...
-ck
© 2016 - 2025 Red Hat, Inc.