[PATCH] zonefs: do not use append if device does not support it

Andreas Hindborg posted 1 patch 2 years, 7 months ago
fs/zonefs/file.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[PATCH] zonefs: do not use append if device does not support it
Posted by Andreas Hindborg 2 years, 7 months ago
From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>

Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
io vector is truncated to zero. It also causes a call to
`invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
not intentional. Thus, do not use append when device does not support it.

Signed-off-by: Andreas Hindborg (Samsung) <nmi@metaspace.dk>
---
 fs/zonefs/file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f..c97fe2aa20b0 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -536,9 +536,11 @@ static ssize_t zonefs_write_checks(struct kiocb *iocb, struct iov_iter *from)
 static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct block_device *bdev = inode->i_sb->s_bdev;
 	struct zonefs_inode_info *zi = ZONEFS_I(inode);
 	struct zonefs_zone *z = zonefs_inode_zone(inode);
 	struct super_block *sb = inode->i_sb;
+	unsigned int max_append = bdev_max_zone_append_sectors(bdev);
 	bool sync = is_sync_kiocb(iocb);
 	bool append = false;
 	ssize_t ret, count;
@@ -581,7 +583,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 		append = sync;
 	}
 
-	if (append) {
+	if (append && max_append) {
 		ret = zonefs_file_dio_append(iocb, from);
 	} else {
 		/*

base-commit: 45a3e24f65e90a047bef86f927ebdc4c710edaa1
-- 
2.41.0
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Christoph Hellwig 2 years, 7 months ago
On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote:
> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
> 
> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
> io vector is truncated to zero. It also causes a call to
> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
> not intentional. Thus, do not use append when device does not support it.

How do you even manage to hit this code?  Zone Append is a mandatory
feature and driver need to check it is available.
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Damien Le Moal 2 years, 7 months ago
On 6/27/23 12:45, Christoph Hellwig wrote:
> On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote:
>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>
>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>> io vector is truncated to zero. It also causes a call to
>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>> not intentional. Thus, do not use append when device does not support it.
> 
> How do you even manage to hit this code?  Zone Append is a mandatory
> feature and driver need to check it is available.

ublk driver probably is missing that check ? I have not looked at the code for
zone support.

But thinking of it, we probably would be better off having a generic check for
"q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().

-- 
Damien Le Moal
Western Digital Research
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Andreas Hindborg (Samsung) 2 years, 7 months ago
Damien Le Moal <dlemoal@kernel.org> writes:

> On 6/27/23 12:45, Christoph Hellwig wrote:
>> On Mon, Jun 26, 2023 at 06:47:52PM +0200, Andreas Hindborg wrote:
>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>>
>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>>> io vector is truncated to zero. It also causes a call to
>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>>> not intentional. Thus, do not use append when device does not support it.
>> 
>> How do you even manage to hit this code?  Zone Append is a mandatory
>> feature and driver need to check it is available.
>
> ublk driver probably is missing that check ? I have not looked at the code for
> zone support.
>
> But thinking of it, we probably would be better off having a generic check for
> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().

I was playing with ublk zone support. It seems I made it buggy by
allowing zone append size to go to zero.

Adding the check would be a nice help to people like me that will
implement whatever in their driver :)

Best regards
Andreas
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Christoph Hellwig 2 years, 7 months ago
On Tue, Jun 27, 2023 at 01:45:38PM +0900, Damien Le Moal wrote:
> But thinking of it, we probably would be better off having a generic check for
> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().

Agreed.
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Damien Le Moal 2 years, 7 months ago
On 6/27/23 13:48, Christoph Hellwig wrote:
> On Tue, Jun 27, 2023 at 01:45:38PM +0900, Damien Le Moal wrote:
>> But thinking of it, we probably would be better off having a generic check for
>> "q->limits.max_zone_append_sectors != 0" in blk_revalidate_disk_zones().
> 
> Agreed.

I'll send something.

-- 
Damien Le Moal
Western Digital Research
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Johannes Thumshirn 2 years, 7 months ago
On 26.06.23 18:47, Andreas Hindborg wrote:
> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
> 
> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
> io vector is truncated to zero. It also causes a call to
> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
> not intentional. Thus, do not use append when device does not support it.
> 

I'm sorry but I think it has been stated often enough that for Linux Zone Append
is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
dead code as max_zone_append_sectors will always be greater than zero.

So this is a clear NAK from my side.


Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Andreas Hindborg (Samsung) 2 years, 7 months ago
Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes:

> On 26.06.23 18:47, Andreas Hindborg wrote:
>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>> 
>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>> io vector is truncated to zero. It also causes a call to
>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>> not intentional. Thus, do not use append when device does not support it.
>> 
>
> I'm sorry but I think it has been stated often enough that for Linux Zone Append
> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
> dead code as max_zone_append_sectors will always be greater than zero.
>
> So this is a clear NAK from my side.

OK, thanks for clarifying 👍 I came across this bugging out while
playing around with zone append for ublk. The code makes sense if the
stack expects append to always be present.

I didn't follow the discussion, could you reiterate why the policy is
that zoned devices _must_ support append?

Best regards,
Andreas
Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Damien Le Moal 2 years, 7 months ago
On 6/27/23 03:23, Andreas Hindborg (Samsung) wrote:
> 
> Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes:
> 
>> On 26.06.23 18:47, Andreas Hindborg wrote:
>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>>
>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>>> io vector is truncated to zero. It also causes a call to
>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>>> not intentional. Thus, do not use append when device does not support it.
>>>
>>
>> I'm sorry but I think it has been stated often enough that for Linux Zone Append
>> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
>> dead code as max_zone_append_sectors will always be greater than zero.
>>
>> So this is a clear NAK from my side.
> 
> OK, thanks for clarifying 👍 I came across this bugging out while
> playing around with zone append for ublk. The code makes sense if the
> stack expects append to always be present.
> 
> I didn't follow the discussion, could you reiterate why the policy is
> that zoned devices _must_ support append?

To avoid support fragmentation and for performance. btrfs zoned block device
support requires zone append and using that command makes writes much faster as
we do not have to go through zone locking.
Note that for zonefs, I plan to add async zone append support as well, linked
with O_APPEND use to further improve write performance with ZNS drives.

> 
> Best regards,
> Andreas
> 

-- 
Damien Le Moal
Western Digital Research

Re: [PATCH] zonefs: do not use append if device does not support it
Posted by Andreas Hindborg (Samsung) 2 years, 7 months ago
Damien Le Moal <dlemoal@kernel.org> writes:

> On 6/27/23 03:23, Andreas Hindborg (Samsung) wrote:
>> 
>> Johannes Thumshirn <Johannes.Thumshirn@wdc.com> writes:
>> 
>>> On 26.06.23 18:47, Andreas Hindborg wrote:
>>>> From: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
>>>>
>>>> Zonefs will try to use `zonefs_file_dio_append()` for direct sync writes even if
>>>> device `max_zone_append_sectors` is zero. This will cause the IO to fail as the
>>>> io vector is truncated to zero. It also causes a call to
>>>> `invalidate_inode_pages2_range()` with end set to UINT_MAX, which is probably
>>>> not intentional. Thus, do not use append when device does not support it.
>>>>
>>>
>>> I'm sorry but I think it has been stated often enough that for Linux Zone Append
>>> is a mandatory feature for a Zoned Block Device. Therefore this path is essentially
>>> dead code as max_zone_append_sectors will always be greater than zero.
>>>
>>> So this is a clear NAK from my side.
>> 
>> OK, thanks for clarifying 👍 I came across this bugging out while
>> playing around with zone append for ublk. The code makes sense if the
>> stack expects append to always be present.
>> 
>> I didn't follow the discussion, could you reiterate why the policy is
>> that zoned devices _must_ support append?
>
> To avoid support fragmentation and for performance. btrfs zoned block device
> support requires zone append and using that command makes writes much faster as
> we do not have to go through zone locking.
> Note that for zonefs, I plan to add async zone append support as well, linked
> with O_APPEND use to further improve write performance with ZNS drives.
>

Thanks for clarifying, Damien 👍

BR Andreas