[PATCH v4 00/13] support non power of 2 zoned devices

Pankaj Raghav posted 13 patches 3 years, 11 months ago
There is a newer version of this series
block/blk-core.c                  |   3 +-
block/blk-zoned.c                 |  40 +++++--
drivers/block/null_blk/main.c     |   5 +-
drivers/block/null_blk/null_blk.h |   6 +
drivers/block/null_blk/zoned.c    |  20 ++--
drivers/md/dm-zone.c              |  12 ++
drivers/nvme/host/zns.c           |  24 ++--
drivers/nvme/target/zns.c         |   2 +-
fs/btrfs/volumes.c                |  24 ++--
fs/btrfs/zoned.c                  | 191 +++++++++++++++++++++---------
fs/btrfs/zoned.h                  |  44 ++++++-
fs/zonefs/super.c                 |   6 +-
fs/zonefs/zonefs.h                |   1 -
include/linux/blkdev.h            |  37 +++++-
14 files changed, 303 insertions(+), 112 deletions(-)
[PATCH v4 00/13] support non power of 2 zoned devices
Posted by Pankaj Raghav 3 years, 11 months ago
- Background and Motivation:

The zone storage implementation in Linux, introduced since v4.10, first
targetted SMR drives which have a power of 2 (po2) zone size alignment
requirement. The po2 zone size was further imposed implicitly by the
block layer's blk_queue_chunk_sectors(), used to prevent IO merging
across chunks beyond the specified size, since v3.16 through commit
762380ad9322 ("block: add notion of a chunk size for request merging").
But this same general block layer po2 requirement for blk_queue_chunk_sectors()
was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
to be non-power-of-2").

NAND, which is the media used in newer zoned storage devices, does not
naturally align to po2. In these devices, zone cap is not the same as the
po2 zone size. When the zone cap != zone size, then unmapped LBAs are
introduced to cover the space between the zone cap and zone size. po2
requirement does not make sense for these type of zone storage devices.
This patch series aims to remove these unmapped LBAs for zoned devices when
zone cap is npo2. This is done by relaxing the po2 zone size constraint
in the kernel and allowing zoned device with npo2 zone sizes if zone cap
== zone size.

Removing the po2 requirement from zone storage should be possible
now provided that no userspace regression and no performance regressions are
introduced. Stop-gap patches have been already merged into f2fs-tools to
proactively not allow npo2 zone sizes until proper support is added [0].
Additional kernel stop-gap patches are provided in this series for dm-zoned.
Support for npo2 zonefs and btrfs support is addressed in this series.

There was an effort previously [1] to add support to non po2 devices via
device level emulation but that was rejected with a final conclusion
to add support for non po2 zoned device in the complete stack[2].

- Patchset description:
This patchset aims at adding support to non power of 2 zoned devices in
the block layer, nvme layer, null blk and adds support to btrfs and
zonefs.

This round of patches **will not** support DM layer for non
power of 2 zoned devices. More about this in the future work section.

Patches 1-2 deals with removing the po2 constraint from the
block layer.

Patches 3-4 deals with removing the constraint from nvme zns.

Patches 5-9 adds support to btrfs for non po2 zoned devices.

Patch 10 removes the po2 constraint in ZoneFS

Patch 11-12 removes the po2 contraint in null blk

Patches 13 adds conditions to not allow non power of 2 devices in
DM.

The patch series is based on linux-next tag: next-20220502

- Performance:
PO2 zone sizes utilizes log and shifts instead of division when
determing alignment, zone number, etc. The same math cannot be used when
using a zoned device with non po2 zone size. Hence, to avoid any performance
regression on zoned devices with po2 zone sizes, the optimized math in the
hot paths has been retained with branching.

The performance was measured using null blk for regression
and the results have been posted in the appropriate commit log. No
performance regression was noticed.

- Testing
With respect to testing we need to tackle two things: one for regression
on po2 zoned device and progression on non po2 zoned devices.

kdevops (https://github.com/mcgrof/kdevops) was extensively used to
automate the testing for blktests and (x)fstests for btrfs changes. The
known failures were excluded during the test based on the baseline
v5.17.0-rc7

-- regression
Emulated zoned device with zone size =128M , nr_zones = 10000

Block and nvme zns:
blktests were run with no new failures

Btrfs:
Changes were tested with the following profile in QEMU:
[btrfs_simple_zns]
TEST_DIR=<dir>
SCRATCH_MNT=<mnt>
FSTYP=btrfs
MKFS_OPTIONS="-f -d single -m single"
TEST_DEV=<dev>
SCRATCH_DEV_POOL=<dev-pool>

No new failures were observed in btrfs, generic and shared test suite

ZoneFS:
zonefs-tests-nullblk.sh and zonefs-tests.sh from zonefs-tools were run
with no failures.

nullblk:
t/zbd/run-tests-against-nullb from fio was run with no failures.

DM:
It was verified if dm-zoned successfully mounts without any
error.

-- progression
Emulated zoned device with zone size = 96M , nr_zones = 10000

Block and nvme zns:
blktests were run with no new failures

Btrfs:
Same profile as po2 zone size was used.

Many tests in xfstests for btrfs included dm-flakey and some tests
required dm-linear. As they are not supported at the moment for non
po2 devices, those **tests were excluded for non po2 devices**.

No new failures were observed in btrfs, generic and shared test suite

ZoneFS:
zonefs-tests.sh from zonefs-tools were run with no failures.

nullblk:
A new section was added to cover non po2 devices:

section14()
{
       conv_pcnt=10
       zone_size=3
       zone_capacity=3
       max_open=${set_max_open}
       zbd_test_opts+=("-o ${max_open}")
}
t/zbd/run-tests-against-nullb from fio was run with no failures.

DM:
It was verified that dm-zoned does not mount.

- Tools:
Some tools had to be updated to support non po2 devices. Once these
patches are accepted in the kernel, these tool updates will also be
upstreamed.
* btrfs-prog: https://github.com/Panky-codes/btrfs-progs/tree/remove-po2-btrfs
* blkzone: https://github.com/Panky-codes/util-linux/tree/remove-po2
* zonefs-tools: https://github.com/Panky-codes/zonefs-tools/tree/remove-po2

- Future work
To reduce the amount of changes and testing, support for DM was
excluded in this round of patches. The plan is to add support to F2FS
and DM in the forthcoming future.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?h=dev-test&id=6afcf6493578e77528abe65ab8b12f3e1c16749f
[1] https://lore.kernel.org/all/20220310094725.GA28499@lst.de/T/
[2] https://lore.kernel.org/all/20220315135245.eqf4tqngxxb7ymqa@unifi/

Changes since v1:
- Put the function declaration and its usage in the same commit (Bart)
- Remove bdev_zone_aligned function (Bart)
- Change the name from blk_queue_zone_aligned to blk_queue_is_zone_start
  (Damien)
- q is never null in from bdev_get_queue (Damien)
- Add condition during bringup and check for zsze == zcap for npo2
  drives (Damien)
- Rounddown operation should be made generic to work in 32 bits arch
  (bart)
- Add comments where generic calculation is directly used instead having
  special handling for po2 zone sizes (Hannes)
- Make the minimum zone size alignment requirement for btrfs to be 1M
  instead of BTRFS_STRIPE_LEN(David)

Changes since v2:
- Minor formatting changes

Changes since v3:
- Make superblock mirror align with the existing superblock log offsets
  (David)
- DM change return value and remove extra newline
- Optimize null blk zone index lookup with shift for po2 zone size

Luis Chamberlain (1):
  dm-zoned: ensure only power of 2 zone sizes are allowed

Pankaj Raghav (12):
  block: make blkdev_nr_zones and blk_queue_zone_no generic for npo2
    zsze
  block: allow blk-zoned devices to have non-power-of-2 zone size
  nvme: zns: Allow ZNS drives that have non-power_of_2 zone size
  nvmet: Allow ZNS target to support non-power_of_2 zone sizes
  btrfs: zoned: Cache superblock location in btrfs_zoned_device_info
  btrfs: zoned: Make sb_zone_number function non power of 2 compatible
  btrfs: zoned: use generic btrfs zone helpers to support npo2 zoned
    devices
  btrfs:zoned: make sb for npo2 zone devices align with sb log offsets
  btrfs: zoned: relax the alignment constraint for zoned devices
  zonefs: allow non power of 2 zoned devices
  null_blk: allow non power of 2 zoned devices
  null_blk: use zone_size_sects_shift for power of 2 zoned devices

 block/blk-core.c                  |   3 +-
 block/blk-zoned.c                 |  40 +++++--
 drivers/block/null_blk/main.c     |   5 +-
 drivers/block/null_blk/null_blk.h |   6 +
 drivers/block/null_blk/zoned.c    |  20 ++--
 drivers/md/dm-zone.c              |  12 ++
 drivers/nvme/host/zns.c           |  24 ++--
 drivers/nvme/target/zns.c         |   2 +-
 fs/btrfs/volumes.c                |  24 ++--
 fs/btrfs/zoned.c                  | 191 +++++++++++++++++++++---------
 fs/btrfs/zoned.h                  |  44 ++++++-
 fs/zonefs/super.c                 |   6 +-
 fs/zonefs/zonefs.h                |   1 -
 include/linux/blkdev.h            |  37 +++++-
 14 files changed, 303 insertions(+), 112 deletions(-)

-- 
2.25.1
Re: [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Christoph Hellwig 3 years, 11 months ago
I'm a little surprised about all this activity.

I though the conclusion at LSF/MM was that for Linux itself there
is very little benefit in supporting this scheme.  It will massively
fragment the supported based of devices and applications, while only
having the benefit of supporting some Samsung legacy devices.

So my impression was that this work, while technically feasible, is
rather useless.  So unless I missed something important I have no
interest in supporting this in NVMe.
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Theodore Ts'o 3 years, 11 months ago
On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
> I'm a little surprised about all this activity.
> 
> I though the conclusion at LSF/MM was that for Linux itself there
> is very little benefit in supporting this scheme.  It will massively
> fragment the supported based of devices and applications, while only
> having the benefit of supporting some Samsung legacy devices.

FWIW,

That wasn't my impression from that LSF/MM session, but once the
videos become available, folks can decide for themselves.

       	      		       	   	  - Ted
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Damien Le Moal 3 years, 11 months ago
On 5/18/22 00:34, Theodore Ts'o wrote:
> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>> I'm a little surprised about all this activity.
>>
>> I though the conclusion at LSF/MM was that for Linux itself there
>> is very little benefit in supporting this scheme.  It will massively
>> fragment the supported based of devices and applications, while only
>> having the benefit of supporting some Samsung legacy devices.
> 
> FWIW,
> 
> That wasn't my impression from that LSF/MM session, but once the
> videos become available, folks can decide for themselves.

There was no real discussion about zone size constraint on the zone
storage BoF. Many discussions happened in the hallway track though.

-- 
Damien Le Moal
Western Digital Research
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Luis Chamberlain 3 years, 11 months ago
On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
> On 5/18/22 00:34, Theodore Ts'o wrote:
> > On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
> >> I'm a little surprised about all this activity.
> >>
> >> I though the conclusion at LSF/MM was that for Linux itself there
> >> is very little benefit in supporting this scheme.  It will massively
> >> fragment the supported based of devices and applications, while only
> >> having the benefit of supporting some Samsung legacy devices.
> > 
> > FWIW,
> > 
> > That wasn't my impression from that LSF/MM session, but once the
> > videos become available, folks can decide for themselves.
> 
> There was no real discussion about zone size constraint on the zone
> storage BoF. Many discussions happened in the hallway track though.

Right so no direct clear blockers mentioned at all during the BoF.

  Luis
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Damien Le Moal 3 years, 11 months ago
On 5/19/22 12:12, Luis Chamberlain wrote:
> On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>> On 5/18/22 00:34, Theodore Ts'o wrote:
>>> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>> I'm a little surprised about all this activity.
>>>>
>>>> I though the conclusion at LSF/MM was that for Linux itself there
>>>> is very little benefit in supporting this scheme.  It will massively
>>>> fragment the supported based of devices and applications, while only
>>>> having the benefit of supporting some Samsung legacy devices.
>>>
>>> FWIW,
>>>
>>> That wasn't my impression from that LSF/MM session, but once the
>>> videos become available, folks can decide for themselves.
>>
>> There was no real discussion about zone size constraint on the zone
>> storage BoF. Many discussions happened in the hallway track though.
> 
> Right so no direct clear blockers mentioned at all during the BoF.

Nor any clear OK.

> 
>   Luis


-- 
Damien Le Moal
Western Digital Research
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Johannes Thumshirn 3 years, 11 months ago
On 19/05/2022 05:19, Damien Le Moal wrote:
> On 5/19/22 12:12, Luis Chamberlain wrote:
>> On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>>> On 5/18/22 00:34, Theodore Ts'o wrote:
>>>> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>>> I'm a little surprised about all this activity.
>>>>>
>>>>> I though the conclusion at LSF/MM was that for Linux itself there
>>>>> is very little benefit in supporting this scheme.  It will massively
>>>>> fragment the supported based of devices and applications, while only
>>>>> having the benefit of supporting some Samsung legacy devices.
>>>>
>>>> FWIW,
>>>>
>>>> That wasn't my impression from that LSF/MM session, but once the
>>>> videos become available, folks can decide for themselves.
>>>
>>> There was no real discussion about zone size constraint on the zone
>>> storage BoF. Many discussions happened in the hallway track though.
>>
>> Right so no direct clear blockers mentioned at all during the BoF.
> 
> Nor any clear OK.

So what about creating a device-mapper target, that's taking npo2 drives and
makes them po2 drives for the FS layers? It will be very similar code to 
dm-linear.

After all zoned support for FSes started with a device-mapper (dm-zoned) and 
as the need for a more integrated solution arose, it changed into natiive
support.

And all that is there is simple arithmetic and a bio_clone(), if this is the
slowest part of the stack involving a FS like f2fs or btrfs I'm throwing a
round of anyone's favorite beverage at next year's LSFMM.

Byte,
	Johannes

Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Damien Le Moal 3 years, 11 months ago
On 5/19/22 16:34, Johannes Thumshirn wrote:
> On 19/05/2022 05:19, Damien Le Moal wrote:
>> On 5/19/22 12:12, Luis Chamberlain wrote:
>>> On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>>>> On 5/18/22 00:34, Theodore Ts'o wrote:
>>>>> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>>>> I'm a little surprised about all this activity.
>>>>>>
>>>>>> I though the conclusion at LSF/MM was that for Linux itself there
>>>>>> is very little benefit in supporting this scheme.  It will massively
>>>>>> fragment the supported based of devices and applications, while only
>>>>>> having the benefit of supporting some Samsung legacy devices.
>>>>>
>>>>> FWIW,
>>>>>
>>>>> That wasn't my impression from that LSF/MM session, but once the
>>>>> videos become available, folks can decide for themselves.
>>>>
>>>> There was no real discussion about zone size constraint on the zone
>>>> storage BoF. Many discussions happened in the hallway track though.
>>>
>>> Right so no direct clear blockers mentioned at all during the BoF.
>>
>> Nor any clear OK.
> 
> So what about creating a device-mapper target, that's taking npo2 drives and
> makes them po2 drives for the FS layers? It will be very similar code to 
> dm-linear.

+1

This will simplify the support for FSes, at least for the initial drop (if
accepted).

And more importantly, this will also allow addressing any potential
problem with user space breaking because of the non power of 2 zone size.

> 
> After all zoned support for FSes started with a device-mapper (dm-zoned) and 
> as the need for a more integrated solution arose, it changed into natiive
> support.
> 
> And all that is there is simple arithmetic and a bio_clone(), if this is the
> slowest part of the stack involving a FS like f2fs or btrfs I'm throwing a
> round of anyone's favorite beverage at next year's LSFMM.
> 
> Byte,
> 	Johannes
> 


-- 
Damien Le Moal
Western Digital Research
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Hannes Reinecke 3 years, 11 months ago
On 5/19/22 20:47, Damien Le Moal wrote:
> On 5/19/22 16:34, Johannes Thumshirn wrote:
>> On 19/05/2022 05:19, Damien Le Moal wrote:
>>> On 5/19/22 12:12, Luis Chamberlain wrote:
>>>> On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>>>>> On 5/18/22 00:34, Theodore Ts'o wrote:
>>>>>> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>>>>> I'm a little surprised about all this activity.
>>>>>>>
>>>>>>> I though the conclusion at LSF/MM was that for Linux itself there
>>>>>>> is very little benefit in supporting this scheme.  It will massively
>>>>>>> fragment the supported based of devices and applications, while only
>>>>>>> having the benefit of supporting some Samsung legacy devices.
>>>>>>
>>>>>> FWIW,
>>>>>>
>>>>>> That wasn't my impression from that LSF/MM session, but once the
>>>>>> videos become available, folks can decide for themselves.
>>>>>
>>>>> There was no real discussion about zone size constraint on the zone
>>>>> storage BoF. Many discussions happened in the hallway track though.
>>>>
>>>> Right so no direct clear blockers mentioned at all during the BoF.
>>>
>>> Nor any clear OK.
>>
>> So what about creating a device-mapper target, that's taking npo2 drives and
>> makes them po2 drives for the FS layers? It will be very similar code to
>> dm-linear.
> 
> +1
> 
> This will simplify the support for FSes, at least for the initial drop (if
> accepted).
> 
> And more importantly, this will also allow addressing any potential
> problem with user space breaking because of the non power of 2 zone size.
> 
Seconded (or maybe thirded).

The changes to support npo2 in the block layer are pretty simple, and 
really I don't have an issue with those.
Then adding a device-mapper target transforming npo2 drives in po2 block 
devices should be pretty trivial.

And once that is in you can start arguing with the the FS folks on 
whether to implement it natively.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Javier González 3 years, 11 months ago
On 20.05.2022 08:07, Hannes Reinecke wrote:
>On 5/19/22 20:47, Damien Le Moal wrote:
>>On 5/19/22 16:34, Johannes Thumshirn wrote:
>>>On 19/05/2022 05:19, Damien Le Moal wrote:
>>>>On 5/19/22 12:12, Luis Chamberlain wrote:
>>>>>On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>>>>>>On 5/18/22 00:34, Theodore Ts'o wrote:
>>>>>>>On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>>>>>>I'm a little surprised about all this activity.
>>>>>>>>
>>>>>>>>I though the conclusion at LSF/MM was that for Linux itself there
>>>>>>>>is very little benefit in supporting this scheme.  It will massively
>>>>>>>>fragment the supported based of devices and applications, while only
>>>>>>>>having the benefit of supporting some Samsung legacy devices.
>>>>>>>
>>>>>>>FWIW,
>>>>>>>
>>>>>>>That wasn't my impression from that LSF/MM session, but once the
>>>>>>>videos become available, folks can decide for themselves.
>>>>>>
>>>>>>There was no real discussion about zone size constraint on the zone
>>>>>>storage BoF. Many discussions happened in the hallway track though.
>>>>>
>>>>>Right so no direct clear blockers mentioned at all during the BoF.
>>>>
>>>>Nor any clear OK.
>>>
>>>So what about creating a device-mapper target, that's taking npo2 drives and
>>>makes them po2 drives for the FS layers? It will be very similar code to
>>>dm-linear.
>>
>>+1
>>
>>This will simplify the support for FSes, at least for the initial drop (if
>>accepted).
>>
>>And more importantly, this will also allow addressing any potential
>>problem with user space breaking because of the non power of 2 zone size.
>>
>Seconded (or maybe thirded).
>
>The changes to support npo2 in the block layer are pretty simple, and 
>really I don't have an issue with those.
>Then adding a device-mapper target transforming npo2 drives in po2 
>block devices should be pretty trivial.
>
>And once that is in you can start arguing with the the FS folks on 
>whether to implement it natively.
>

So you are suggesting adding support for !PO2 in the block layer and
then a dm to present the device as a PO2 to the FS? This at least
addresses the hole issue for raw zoned block devices, so it can be a
first step.

This said, it seems to me that the changes to the FS are not being a
real issue. In fact, we are exposing some bugs while we generalize the
zone size support.

Could you point out what the challenges in btrfs are in the current
patches, that it makes sense to add an extra dm layer?

Note that for F2FS there is no blocker. Jaegeuk picked the initial
patches, and he agreed to add native support.
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Johannes Thumshirn 3 years, 11 months ago
On 20/05/2022 08:27, Javier González wrote:
> So you are suggesting adding support for !PO2 in the block layer and
> then a dm to present the device as a PO2 to the FS? This at least
> addresses the hole issue for raw zoned block devices, so it can be a
> first step.
> 
> This said, it seems to me that the changes to the FS are not being a
> real issue. In fact, we are exposing some bugs while we generalize the
> zone size support.
> 
> Could you point out what the challenges in btrfs are in the current
> patches, that it makes sense to add an extra dm layer?

I personally don't like the padding we need to do for the super block.

As I've pointed out to Pankaj already, I don't think it is 100% powerfail
safe as of now. It could probably be made, but that would also involve
changing non-zoned btrfs code which we try to avoid as much as we can.

As Damien already said, we still have issues with the general zoned 
support in btrfs, just have a look at the list of open issues [1] we
have. 

[1] https://github.com/naota/linux/issues/
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Javier González 3 years, 11 months ago
On 20.05.2022 09:30, Johannes Thumshirn wrote:
>On 20/05/2022 08:27, Javier González wrote:
>> So you are suggesting adding support for !PO2 in the block layer and
>> then a dm to present the device as a PO2 to the FS? This at least
>> addresses the hole issue for raw zoned block devices, so it can be a
>> first step.
>>
>> This said, it seems to me that the changes to the FS are not being a
>> real issue. In fact, we are exposing some bugs while we generalize the
>> zone size support.
>>
>> Could you point out what the challenges in btrfs are in the current
>> patches, that it makes sense to add an extra dm layer?
>
>I personally don't like the padding we need to do for the super block.
>
>As I've pointed out to Pankaj already, I don't think it is 100% powerfail
>safe as of now. It could probably be made, but that would also involve
>changing non-zoned btrfs code which we try to avoid as much as we can.
>
>As Damien already said, we still have issues with the general zoned
>support in btrfs, just have a look at the list of open issues [1] we
>have.
>
Sounds good Johannes. I understand that the priority is to make btrfs
stable now, before introducing more variables. Let's stick to this and
then we can bring it back as the list of open issues becomes more
manageable.

>[1] https://protect2.fireeye.com/v1/url?k=f14a1d6f-90c10859-f14b9620-74fe485fffe0-3f1861e7739d8cc7&q=1&e=213fcc28-3f9d-41a1-b653-0dc0e203c718&u=https%3A%2F%2Fgithub.com%2Fnaota%2Flinux%2Fissues%2F

Thanks for sharing this too. It is a good way to where to help
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Damien Le Moal 3 years, 11 months ago
On 5/20/22 15:27, Javier González wrote:
> On 20.05.2022 08:07, Hannes Reinecke wrote:
>> On 5/19/22 20:47, Damien Le Moal wrote:
>>> On 5/19/22 16:34, Johannes Thumshirn wrote:
>>>> On 19/05/2022 05:19, Damien Le Moal wrote:
>>>>> On 5/19/22 12:12, Luis Chamberlain wrote:
>>>>>> On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>>>>>>> On 5/18/22 00:34, Theodore Ts'o wrote:
>>>>>>>> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>>>>>>> I'm a little surprised about all this activity.
>>>>>>>>>
>>>>>>>>> I though the conclusion at LSF/MM was that for Linux itself there
>>>>>>>>> is very little benefit in supporting this scheme.  It will massively
>>>>>>>>> fragment the supported based of devices and applications, while only
>>>>>>>>> having the benefit of supporting some Samsung legacy devices.
>>>>>>>>
>>>>>>>> FWIW,
>>>>>>>>
>>>>>>>> That wasn't my impression from that LSF/MM session, but once the
>>>>>>>> videos become available, folks can decide for themselves.
>>>>>>>
>>>>>>> There was no real discussion about zone size constraint on the zone
>>>>>>> storage BoF. Many discussions happened in the hallway track though.
>>>>>>
>>>>>> Right so no direct clear blockers mentioned at all during the BoF.
>>>>>
>>>>> Nor any clear OK.
>>>>
>>>> So what about creating a device-mapper target, that's taking npo2 drives and
>>>> makes them po2 drives for the FS layers? It will be very similar code to
>>>> dm-linear.
>>>
>>> +1
>>>
>>> This will simplify the support for FSes, at least for the initial drop (if
>>> accepted).
>>>
>>> And more importantly, this will also allow addressing any potential
>>> problem with user space breaking because of the non power of 2 zone size.
>>>
>> Seconded (or maybe thirded).
>>
>> The changes to support npo2 in the block layer are pretty simple, and 
>> really I don't have an issue with those.
>> Then adding a device-mapper target transforming npo2 drives in po2 
>> block devices should be pretty trivial.
>>
>> And once that is in you can start arguing with the the FS folks on 
>> whether to implement it natively.
>>
> 
> So you are suggesting adding support for !PO2 in the block layer and
> then a dm to present the device as a PO2 to the FS? This at least
> addresses the hole issue for raw zoned block devices, so it can be a
> first step.

Yes, and it also allows supporting these new !po2 devices without
regressions (read lack of) in the support at FS level.

> 
> This said, it seems to me that the changes to the FS are not being a
> real issue. In fact, we are exposing some bugs while we generalize the
> zone size support.

Not arguing with that. But since we are still stabilizing btrfs ZNS
support, adding more code right now is a little painful.

> 
> Could you point out what the challenges in btrfs are in the current
> patches, that it makes sense to add an extra dm layer?

See above. No real challenge, just needs to be done if a clear agreement
can be reached on zone size alignment constraints. As mentioned above, the
btrfs changes timing is not ideal right now though.

Also please do not forget applications that may expect a power of 2 zone
size. A dm-zsp2 would be a nice solution for these. So regardless of the
FS work, that new DM target will be *very* nice to have.

> 
> Note that for F2FS there is no blocker. Jaegeuk picked the initial
> patches, and he agreed to add native support.

And until that is done, f2fs will not work with these new !po2 devices...
Having the new dm will avoid that support fragmentation which I personally
really dislike. With the new dm, we can keep support for *all* zoned block
devices, albeit needing a different setup depending on the device. That is
not nice at all but at least there is a way to make things work continuously.

-- 
Damien Le Moal
Western Digital Research
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Pankaj Raghav 3 years, 11 months ago
On 5/20/22 08:41, Damien Le Moal wrote:
>>>>>
>>>>> So what about creating a device-mapper target, that's taking npo2 drives and
>>>>> makes them po2 drives for the FS layers? It will be very similar code to
>>>>> dm-linear.
>>>>
Keith and Adam had a similar suggestion to go create a device mapper
(dm-unholy) when we tried the po2 emulation[1].
>>>> +1
>>>>
>>>> This will simplify the support for FSes, at least for the initial drop (if
>>>> accepted).
>>>>
>>>> And more importantly, this will also allow addressing any potential
>>>> problem with user space breaking because of the non power of 2 zone size.
>>>>
>>> Seconded (or maybe thirded).
>>>
>>> The changes to support npo2 in the block layer are pretty simple, and 
>>> really I don't have an issue with those.
>>> Then adding a device-mapper target transforming npo2 drives in po2 
>>> block devices should be pretty trivial.
>>>
>>> And once that is in you can start arguing with the the FS folks on 
>>> whether to implement it natively.
>>>
>>
>> So you are suggesting adding support for !PO2 in the block layer and
>> then a dm to present the device as a PO2 to the FS? This at least
>> addresses the hole issue for raw zoned block devices, so it can be a
>> first step.
> 
> Yes, and it also allows supporting these new !po2 devices without
> regressions (read lack of) in the support at FS level.
> 
>>
>> This said, it seems to me that the changes to the FS are not being a
>> real issue. In fact, we are exposing some bugs while we generalize the
>> zone size support.
> 
> Not arguing with that. But since we are still stabilizing btrfs ZNS
> support, adding more code right now is a little painful.
> 
>>
>> Could you point out what the challenges in btrfs are in the current
>> patches, that it makes sense to add an extra dm layer?
> 
> See above. No real challenge, just needs to be done if a clear agreement
> can be reached on zone size alignment constraints. As mentioned above, the
> btrfs changes timing is not ideal right now though.
> 
> Also please do not forget applications that may expect a power of 2 zone
> size. A dm-zsp2 would be a nice solution for these. So regardless of the
> FS work, that new DM target will be *very* nice to have.
> 
>>
>> Note that for F2FS there is no blocker. Jaegeuk picked the initial
>> patches, and he agreed to add native support.
> 
> And until that is done, f2fs will not work with these new !po2 devices...
> Having the new dm will avoid that support fragmentation which I personally
> really dislike. With the new dm, we can keep support for *all* zoned block
> devices, albeit needing a different setup depending on the device. That is
> not nice at all but at least there is a way to make things work continuously.
> 

I see that many people in the community feel it is better to target the
dm layer for the initial support of npo2 devices. I can give it a shot
and maintain a native out-of-tree support for FSs for npo2 devices and
merge it upstream as we see fit later.

[1]
https://lore.kernel.org/all/20220311223032.GA2439@dhcp-10-100-145-180.wdc.com/
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by David Sterba 3 years, 11 months ago
On Fri, May 20, 2022 at 11:30:09AM +0200, Pankaj Raghav wrote:
> On 5/20/22 08:41, Damien Le Moal wrote:
> >> Note that for F2FS there is no blocker. Jaegeuk picked the initial
> >> patches, and he agreed to add native support.
> > 
> > And until that is done, f2fs will not work with these new !po2 devices...
> > Having the new dm will avoid that support fragmentation which I personally
> > really dislike. With the new dm, we can keep support for *all* zoned block
> > devices, albeit needing a different setup depending on the device. That is
> > not nice at all but at least there is a way to make things work continuously.
> 
> I see that many people in the community feel it is better to target the
> dm layer for the initial support of npo2 devices. I can give it a shot
> and maintain a native out-of-tree support for FSs for npo2 devices and
> merge it upstream as we see fit later.

Some of the changes from your patchset are cleanups or abstracting the
alignment and zone calculations, so this can be merged to minimize the
out of tree code.
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Pankaj Raghav 3 years, 11 months ago
On 5/20/22 19:18, David Sterba wrote:
>> I see that many people in the community feel it is better to target the
>> dm layer for the initial support of npo2 devices. I can give it a shot
>> and maintain a native out-of-tree support for FSs for npo2 devices and
>> merge it upstream as we see fit later.
> 
> Some of the changes from your patchset are cleanups or abstracting the
> alignment and zone calculations, so this can be merged to minimize the
> out of tree code.
Sounds good. I will send it separately. Thanks.
Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Javier González 3 years, 11 months ago
On 20.05.2022 15:41, Damien Le Moal wrote:
>On 5/20/22 15:27, Javier González wrote:
>> On 20.05.2022 08:07, Hannes Reinecke wrote:
>>> On 5/19/22 20:47, Damien Le Moal wrote:
>>>> On 5/19/22 16:34, Johannes Thumshirn wrote:
>>>>> On 19/05/2022 05:19, Damien Le Moal wrote:
>>>>>> On 5/19/22 12:12, Luis Chamberlain wrote:
>>>>>>> On Thu, May 19, 2022 at 12:08:26PM +0900, Damien Le Moal wrote:
>>>>>>>> On 5/18/22 00:34, Theodore Ts'o wrote:
>>>>>>>>> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
>>>>>>>>>> I'm a little surprised about all this activity.
>>>>>>>>>>
>>>>>>>>>> I though the conclusion at LSF/MM was that for Linux itself there
>>>>>>>>>> is very little benefit in supporting this scheme.  It will massively
>>>>>>>>>> fragment the supported based of devices and applications, while only
>>>>>>>>>> having the benefit of supporting some Samsung legacy devices.
>>>>>>>>>
>>>>>>>>> FWIW,
>>>>>>>>>
>>>>>>>>> That wasn't my impression from that LSF/MM session, but once the
>>>>>>>>> videos become available, folks can decide for themselves.
>>>>>>>>
>>>>>>>> There was no real discussion about zone size constraint on the zone
>>>>>>>> storage BoF. Many discussions happened in the hallway track though.
>>>>>>>
>>>>>>> Right so no direct clear blockers mentioned at all during the BoF.
>>>>>>
>>>>>> Nor any clear OK.
>>>>>
>>>>> So what about creating a device-mapper target, that's taking npo2 drives and
>>>>> makes them po2 drives for the FS layers? It will be very similar code to
>>>>> dm-linear.
>>>>
>>>> +1
>>>>
>>>> This will simplify the support for FSes, at least for the initial drop (if
>>>> accepted).
>>>>
>>>> And more importantly, this will also allow addressing any potential
>>>> problem with user space breaking because of the non power of 2 zone size.
>>>>
>>> Seconded (or maybe thirded).
>>>
>>> The changes to support npo2 in the block layer are pretty simple, and
>>> really I don't have an issue with those.
>>> Then adding a device-mapper target transforming npo2 drives in po2
>>> block devices should be pretty trivial.
>>>
>>> And once that is in you can start arguing with the the FS folks on
>>> whether to implement it natively.
>>>
>>
>> So you are suggesting adding support for !PO2 in the block layer and
>> then a dm to present the device as a PO2 to the FS? This at least
>> addresses the hole issue for raw zoned block devices, so it can be a
>> first step.
>
>Yes, and it also allows supporting these new !po2 devices without
>regressions (read lack of) in the support at FS level.
>
>>
>> This said, it seems to me that the changes to the FS are not being a
>> real issue. In fact, we are exposing some bugs while we generalize the
>> zone size support.
>
>Not arguing with that. But since we are still stabilizing btrfs ZNS
>support, adding more code right now is a little painful.
>
>>
>> Could you point out what the challenges in btrfs are in the current
>> patches, that it makes sense to add an extra dm layer?
>
>See above. No real challenge, just needs to be done if a clear agreement
>can be reached on zone size alignment constraints. As mentioned above, the
>btrfs changes timing is not ideal right now though.
>
>Also please do not forget applications that may expect a power of 2 zone
>size. A dm-zsp2 would be a nice solution for these. So regardless of the
>FS work, that new DM target will be *very* nice to have.
>
>>
>> Note that for F2FS there is no blocker. Jaegeuk picked the initial
>> patches, and he agreed to add native support.
>
>And until that is done, f2fs will not work with these new !po2 devices...
>Having the new dm will avoid that support fragmentation which I personally
>really dislike. With the new dm, we can keep support for *all* zoned block
>devices, albeit needing a different setup depending on the device. That is
>not nice at all but at least there is a way to make things work continuously.

All the above sounds very reasonable. Thanks Damien.

If we all can agree, we can address this in the next version and come
maintain the native FS support off-tree until you see that general btrfs
support for zoned devicse is stable. We will be happy to help with this
too.

Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Luis Chamberlain 3 years, 11 months ago
On Tue, May 17, 2022 at 11:34:54AM -0400, Theodore Ts'o wrote:
> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
> > I'm a little surprised about all this activity.
> > 
> > I though the conclusion at LSF/MM was that for Linux itself there
> > is very little benefit in supporting this scheme.  It will massively
> > fragment the supported based of devices and applications, while only
> > having the benefit of supporting some Samsung legacy devices.
> 
> FWIW,
> 
> That wasn't my impression from that LSF/MM session, but once the
> videos become available, folks can decide for themselves.

Agreed, contrary to conventional storage devices, with the zone storage
ecosystem we simply have a requirement of zone drive replacements matching
zone size. That requirement exists for po2 or npo2. The work in this patch
set proves that supporting npo2 was in the end straight forward. As the one
putting together the BoF I can say that there were no sticking points raised
to move forward with this when the topic came up. So I am very surprised to
hear about any other perceived conclusion.

  Luis
Re: [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Javier González 3 years, 11 months ago
On 17.05.2022 10:10, Christoph Hellwig wrote:
>I'm a little surprised about all this activity.
>
>I though the conclusion at LSF/MM was that for Linux itself there
>is very little benefit in supporting this scheme.  It will massively
>fragment the supported based of devices and applications, while only
>having the benefit of supporting some Samsung legacy devices.

I believed we had agreed that non-power-of-2 zoned devices was something
to explore. Let me summarize the 3 main points we covered at different
times at LSF/MM:

   - This is not for legacy Samsung ZNS devices. At least 4 other
     vendors have reported building non-power-of-2 ZNS devices to meet
     customer demands on removing holes in the address space. It seems
     like there will be more ZNS devices with size=capacity out there
     than with PO2 sizes. Block device and FS support is very desirable
     for these.

   - We also talked about how the capacity not being a PO2 is the one
     introducing the fragmentation, as applications that already worked
     with SMR HDDs will have to change their data placement policy. The
     size is just a construction, but the real work is adopting the
     capacity.

   - Besides the previous poit, the fragmentation will happen from the
     moment we have available devices. This is not a kernel-only issue.
     We have SMR, ZNS, and soon another spec for zone devices. I
     understood that as long as we do not break any existing support, we
     would be able to expend the zoned ecosystem in Linux.

>So my impression was that this work, while technically feasible, is
>rather useless.  So unless I missed something important I have no
>interest in supporting this in NVMe.

Does the above help you reconsidering your interest in supporting this
in NVMe?
Re: [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Christoph Hellwig 3 years, 11 months ago
On Tue, May 17, 2022 at 11:18:34AM +0200, Javier González wrote:
> Does the above help you reconsidering your interest in supporting this
> in NVMe?

Very little.  It just seems like a really bad idea.
Re: [PATCH v4 00/13] support non power of 2 zoned devices
Posted by Javier González 3 years, 11 months ago
> On 18 May 2022, at 10.16, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Tue, May 17, 2022 at 11:18:34AM +0200, Javier González wrote:
>> Does the above help you reconsidering your interest in supporting this
>> in NVMe?
> 
> Very little.  It just seems like a really bad idea.

I understand you don’t like this, but I still hope you see value in supporting it. We are getting close to a very minimal patchset, which is also helping to fix bugs in the zoned stack.

If you take a look at the last version abs give some feedback, I’m sure we can end up with a good solution. 

Can you help?