fs/erofs/inode.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)
Fast symlink can be used if the on-disk symlink data is stored
in the same block as the on-disk inode, so we don’t need to trigger
another I/O for symlink data. However, correctly fs correction could be
reported _incorrectly_ if inode xattrs are too large.
In fact, these should be valid images although they cannot be handled as
fast symlinks.
Many thanks to Colin for reporting this!
Reported-by: Colin Walters <walters@verbum.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
fs/erofs/inode.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 5f6439a63af7..79a29841ae1c 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -178,12 +178,13 @@ static int erofs_fill_symlink(struct inode *inode, void *kaddr,
unsigned int m_pofs)
{
struct erofs_inode *vi = EROFS_I(inode);
- unsigned int bsz = i_blocksize(inode);
+ loff_t off;
char *lnk;
- /* if it cannot be handled with fast symlink scheme */
- if (vi->datalayout != EROFS_INODE_FLAT_INLINE ||
- inode->i_size >= bsz || inode->i_size < 0) {
+ /* check if it cannot be handled with fast symlink scheme */
+ if (vi->datalayout != EROFS_INODE_FLAT_INLINE || inode->i_size < 0 ||
+ check_add_overflow(m_pofs + vi->xattr_isize, inode->i_size, &off) ||
+ off > i_blocksize(inode)) {
inode->i_op = &erofs_symlink_iops;
return 0;
}
@@ -192,16 +193,6 @@ static int erofs_fill_symlink(struct inode *inode, void *kaddr,
if (!lnk)
return -ENOMEM;
- m_pofs += vi->xattr_isize;
- /* inline symlink data shouldn't cross block boundary */
- if (m_pofs + inode->i_size > bsz) {
- kfree(lnk);
- erofs_err(inode->i_sb,
- "inline data cross block boundary @ nid %llu",
- vi->nid);
- DBG_BUGON(1);
- return -EFSCORRUPTED;
- }
memcpy(lnk, kaddr + m_pofs, inode->i_size);
lnk[inode->i_size] = '\0';
--
2.43.5
On 2024/9/9 10:28, Gao Xiang wrote: > Fast symlink can be used if the on-disk symlink data is stored > in the same block as the on-disk inode, so we don’t need to trigger > another I/O for symlink data. However, correctly fs correction could be > reported _incorrectly_ if inode xattrs are too large. > > In fact, these should be valid images although they cannot be handled as > fast symlinks. > > Many thanks to Colin for reporting this! > > Reported-by: Colin Walters <walters@verbum.org> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Fixes: 431339ba9042 ("staging: erofs: add inode operations") Thanks, Gao Xiang
Fast symlink can be used if the on-disk symlink data is stored
in the same block as the on-disk inode, so we don’t need to trigger
another I/O for symlink data. However, correctly fs correction could be
reported _incorrectly_ if inode xattrs are too large.
In fact, these should be valid images although they cannot be handled as
fast symlinks.
Many thanks to Colin for reporting this!
Reported-by: Colin Walters <walters@verbum.org>
Fixes: 431339ba9042 ("staging: erofs: add inode operations")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v2:
- sent out a wrong version (`m_pofs += vi->xattr_isize` was missed).
fs/erofs/inode.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 5f6439a63af7..f2cab9e4f3bc 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -178,12 +178,14 @@ static int erofs_fill_symlink(struct inode *inode, void *kaddr,
unsigned int m_pofs)
{
struct erofs_inode *vi = EROFS_I(inode);
- unsigned int bsz = i_blocksize(inode);
+ loff_t off;
char *lnk;
- /* if it cannot be handled with fast symlink scheme */
- if (vi->datalayout != EROFS_INODE_FLAT_INLINE ||
- inode->i_size >= bsz || inode->i_size < 0) {
+ m_pofs += vi->xattr_isize;
+ /* check if it cannot be handled with fast symlink scheme */
+ if (vi->datalayout != EROFS_INODE_FLAT_INLINE || inode->i_size < 0 ||
+ check_add_overflow(m_pofs, inode->i_size, &off) ||
+ off > i_blocksize(inode)) {
inode->i_op = &erofs_symlink_iops;
return 0;
}
@@ -192,16 +194,6 @@ static int erofs_fill_symlink(struct inode *inode, void *kaddr,
if (!lnk)
return -ENOMEM;
- m_pofs += vi->xattr_isize;
- /* inline symlink data shouldn't cross block boundary */
- if (m_pofs + inode->i_size > bsz) {
- kfree(lnk);
- erofs_err(inode->i_sb,
- "inline data cross block boundary @ nid %llu",
- vi->nid);
- DBG_BUGON(1);
- return -EFSCORRUPTED;
- }
memcpy(lnk, kaddr + m_pofs, inode->i_size);
lnk[inode->i_size] = '\0';
--
2.43.5
On Sun, Sep 8, 2024, at 11:19 PM, Gao Xiang wrote: > Fast symlink can be used if the on-disk symlink data is stored > in the same block as the on-disk inode, so we don’t need to trigger > another I/O for symlink data. However, correctly fs correction could be > reported _incorrectly_ if inode xattrs are too large. > > In fact, these should be valid images although they cannot be handled as > fast symlinks. > > Many thanks to Colin for reporting this! Yes, though feel free to also add Reported-by: https://honggfuzz.dev/ or so...not totally sure how to credit it "kernel style" bit honggfuzz very much helped me find this bug (indirectly). > > Reported-by: Colin Walters <walters@verbum.org> > Fixes: 431339ba9042 ("staging: erofs: add inode operations") > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > --- > v2: > - sent out a wrong version (`m_pofs += vi->xattr_isize` was missed). > > fs/erofs/inode.c | 20 ++++++-------------- > 1 file changed, 6 insertions(+), 14 deletions(-) > > diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c > index 5f6439a63af7..f2cab9e4f3bc 100644 > --- a/fs/erofs/inode.c > +++ b/fs/erofs/inode.c > @@ -178,12 +178,14 @@ static int erofs_fill_symlink(struct inode > *inode, void *kaddr, > unsigned int m_pofs) > { > struct erofs_inode *vi = EROFS_I(inode); > - unsigned int bsz = i_blocksize(inode); > + loff_t off; > char *lnk; > > - /* if it cannot be handled with fast symlink scheme */ > - if (vi->datalayout != EROFS_INODE_FLAT_INLINE || > - inode->i_size >= bsz || inode->i_size < 0) { > + m_pofs += vi->xattr_isize; > + /* check if it cannot be handled with fast symlink scheme */ > + if (vi->datalayout != EROFS_INODE_FLAT_INLINE || inode->i_size < 0 || > + check_add_overflow(m_pofs, inode->i_size, &off) || > + off > i_blocksize(inode)) { > inode->i_op = &erofs_symlink_iops; > return 0; > } This change LGTM. > @@ -192,16 +194,6 @@ static int erofs_fill_symlink(struct inode *inode, > void *kaddr, > if (!lnk) > return -ENOMEM; > > - m_pofs += vi->xattr_isize; > - /* inline symlink data shouldn't cross block boundary */ > - if (m_pofs + inode->i_size > bsz) { It can cross a boundary but it still can't be *bigger* right? IOW do we still need to check here for inode->i_size > bsz or is that handled by something else generic?
Hi Colin, On 2024/9/9 20:48, Colin Walters wrote: > > > On Sun, Sep 8, 2024, at 11:19 PM, Gao Xiang wrote: >> Fast symlink can be used if the on-disk symlink data is stored >> in the same block as the on-disk inode, so we don’t need to trigger >> another I/O for symlink data. However, correctly fs correction could be >> reported _incorrectly_ if inode xattrs are too large. >> >> In fact, these should be valid images although they cannot be handled as >> fast symlinks. >> >> Many thanks to Colin for reporting this! > > Yes, though feel free to also add > Reported-by: https://honggfuzz.dev/ > or so...not totally sure how to credit it "kernel style" bit honggfuzz very much helped me find this bug (indirectly). Ok, it sounds good to me. I will add this later. > > > >> >> Reported-by: Colin Walters <walters@verbum.org> >> Fixes: 431339ba9042 ("staging: erofs: add inode operations") >> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> >> --- >> v2: >> - sent out a wrong version (`m_pofs += vi->xattr_isize` was missed). >> >> fs/erofs/inode.c | 20 ++++++-------------- >> 1 file changed, 6 insertions(+), 14 deletions(-) >> >> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c >> index 5f6439a63af7..f2cab9e4f3bc 100644 >> --- a/fs/erofs/inode.c >> +++ b/fs/erofs/inode.c >> @@ -178,12 +178,14 @@ static int erofs_fill_symlink(struct inode >> *inode, void *kaddr, >> unsigned int m_pofs) >> { >> struct erofs_inode *vi = EROFS_I(inode); >> - unsigned int bsz = i_blocksize(inode); >> + loff_t off; >> char *lnk; >> >> - /* if it cannot be handled with fast symlink scheme */ >> - if (vi->datalayout != EROFS_INODE_FLAT_INLINE || >> - inode->i_size >= bsz || inode->i_size < 0) { >> + m_pofs += vi->xattr_isize; >> + /* check if it cannot be handled with fast symlink scheme */ >> + if (vi->datalayout != EROFS_INODE_FLAT_INLINE || inode->i_size < 0 || >> + check_add_overflow(m_pofs, inode->i_size, &off) || >> + off > i_blocksize(inode)) { >> inode->i_op = &erofs_symlink_iops; >> return 0; >> } > > This change LGTM. > >> @@ -192,16 +194,6 @@ static int erofs_fill_symlink(struct inode *inode, >> void *kaddr, >> if (!lnk) >> return -ENOMEM; >> >> - m_pofs += vi->xattr_isize; >> - /* inline symlink data shouldn't cross block boundary */ >> - if (m_pofs + inode->i_size > bsz) { > > It can cross a boundary but it still can't be *bigger* right? IOW do we still need to check here for inode->i_size > bsz or is that handled by something else generic? It can be bigger. Like ext4, EROFS supports PAGE_SIZE symlink via page_get_link() (non-fastsymlink cases), but mostly consider this as 4KiB though regardless of on-disk block sizes. Thanks, Gao Xiang
On Mon, Sep 9, 2024, at 9:21 AM, Gao Xiang wrote: > > It can be bigger. > > Like ext4, EROFS supports PAGE_SIZE symlink via page_get_link() > (non-fastsymlink cases), but mostly consider this as 4KiB though > regardless of on-disk block sizes. But symlink targets can't be bigger than PATH_MAX which has always been 4KiB right? (Does Linux support systems with sub-4KiB pages?) I guess let me ask it a different way: Since we're removing a sanity check here I just want to be sure that the constraints are still handled. Hmm, skimming through the vfs code from vfs_readlink() I'm not seeing anything constraining the userspace buffer length which seems surprising. Ah interesting, XFS has #define XFS_SYMLINK_MAXLEN 1024 and always constrains even its kmalloc invocation to that.
On 2024/9/9 21:58, Colin Walters wrote: > > > On Mon, Sep 9, 2024, at 9:21 AM, Gao Xiang wrote: >> >> It can be bigger. >> >> Like ext4, EROFS supports PAGE_SIZE symlink via page_get_link() >> (non-fastsymlink cases), but mostly consider this as 4KiB though >> regardless of on-disk block sizes. Let me rephrase it more clearer... For each EROFS logical block read (e.g 4KiB), EROFS will only generate one filesystem physical block read instead of two or more physical block read. Symlink files just like regular files, so it doesn't allow the inline tail crosses physical block boundary (which means an 8KiB I/O is needed), see: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v6.10#n103 In that case, EROFS_INODE_FLAT_INLINE (2) shouldn't be used and EROFS_INODE_FLAT_PLAIN(0) need to be used instead in order to contain full data. But the fast symlink handling has an issue: it should fall back to non-fastsymlink path instead of erroring out directly. > > But symlink targets can't be bigger than PATH_MAX which has always been 4KiB right? (Does Linux support systems with sub-4KiB pages?) Yes, I think 4KiB is the minimum page size of Linux, so in practice the maximum size of generic Linux is PATH_MAX(4KiB). > > I guess let me ask it a different way: Since we're removing a sanity check here I just want to be sure that the constraints are still handled. It seems page_get_link() doesn't check this, but it will add trailing `\0` to the end buffer (PAGE_SIZE - 1), so at least it won't crash the kernel. > > Hmm, skimming through the vfs code from vfs_readlink() I'm not seeing anything constraining the userspace buffer length which seems surprising. > > Ah interesting, XFS has > #define XFS_SYMLINK_MAXLEN 1024 > and always constrains even its kmalloc invocation to that. Not quite sure about hard limitation in EROFS runtime, we could define #define EROFS_SYMLINK_MAXLEN 4096 But since symlink i_size > 4096 only due to crafted images (and not generated by mkfs) and not crash, so either way (to check or not check in kernel) is okay to me. Thanks, Gao Xiang >
On Mon, Sep 9, 2024, at 10:14 AM, Gao Xiang wrote: > > Not quite sure about hard limitation in EROFS > runtime, we could define > > #define EROFS_SYMLINK_MAXLEN 4096 Not sure that a new define is needed versus just reusing PATH_MAX, but that's obviously just a style thing that's much more your call than mine. > But since symlink i_size > 4096 only due to crafted > images (and not generated by mkfs) and not crash, so > either way (to check or not check in kernel) is okay > to me. Yes, but my understanding was that EROFS (in contrast to other kernel read-write filesystems which are more complicated) was aiming to be robust against potentially malicious images. Crafted/malicious images aside, there's also the IMO obvious angle here that we should avoid crashes or worse out-of-bound read/write if there happens to be *accidental* on-disk/memory corruption and having high bit(s) flip in a symlink inode size seems like an easy one to handle. Skimming the XFS code for example it looks like it's pretty robust in this area.
On 2024/9/9 22:46, Colin Walters wrote: > > > On Mon, Sep 9, 2024, at 10:14 AM, Gao Xiang wrote: >> >> Not quite sure about hard limitation in EROFS >> runtime, we could define >> >> #define EROFS_SYMLINK_MAXLEN 4096 > > Not sure that a new define is needed versus just reusing PATH_MAX, but that's obviously just a style thing that's much more your call than mine. > >> But since symlink i_size > 4096 only due to crafted >> images (and not generated by mkfs) and not crash, so >> either way (to check or not check in kernel) is okay >> to me. > > Yes, but my understanding was that EROFS (in contrast to other kernel read-write filesystems which are more complicated) was aiming to be robust against potentially malicious images. Just my personal opinion, my understanding of rubustness is stability and security. But whether to check or not check this, it doesn't crash the kernel or deadlock or livelock, so IMHO, it's already rubustness. Actually, I think EROFS for i_size > PAGE_SIZE, it's an undefined or reserved behavior for now (just like CPU reserved bits or don't care bits), just Linux implementation treats it with PAGE_SIZE-1 trailing '\0', but using erofs dump tool you could still dump large symlinks. Since PATH_MAX is a system-defined constant too, currently Linux PATH_MAX is 4096, but how about other OSes? I've seen some `PATH_MAX 8192` reference but I'm not sure which OS uses this setting. But I think it's a filesystem on-disk limitation, but if i_size exceeds that, we return -EOPNOTSUPP or -EFSCORRUPTED? For this symlink case, I tend to return -EFSCORRUPTED but for other similar but complex cases, it could be hard to decide. Leaving them as undefined behaviors are also an option as long as the behavior is secure. > > Crafted/malicious images aside, there's also the IMO obvious angle here that we should avoid crashes or worse out-of-bound read/write if there happens to be *accidental* on-disk/memory corruption and having high bit(s) flip in a symlink inode size seems like an easy one to handle. Skimming the XFS code for example it looks like it's pretty robust in this area. Yes, for this case it's much simple and easy so that's fine, but I think for some other cases, leaving some undefined or reserved behaviors are also good for later extendability (again, like CPU register design.) as long as it doesn't cause security issues. Thanks, Gao Xiang
On Mon, Sep 9, 2024, at 11:40 AM, Gao Xiang wrote: > > Just my personal opinion, my understanding of rubustness > is stability and security. > > But whether to check or not check this, it doesn't crash > the kernel or deadlock or livelock, so IMHO, it's already > rubustness. OK, if you're telling me this is already checked elsewhere then I think we can call this end of thread. > Actually, I think EROFS for i_size > PAGE_SIZE, it's an > undefined or reserved behavior for now (just like CPU > reserved bits or don't care bits), just Linux > implementation treats it with PAGE_SIZE-1 trailing '\0', > but using erofs dump tool you could still dump large > symlinks. > > Since PATH_MAX is a system-defined constant too, currently > Linux PATH_MAX is 4096, but how about other OSes? I've > seen some `PATH_MAX 8192` reference but I'm not sure which > OS uses this setting. Famously GNU Hurd tried to avoid having a PATH_MAX at all: https://www.gnu.org/software/hurd/community/gsoc/project_ideas/maxpath.html But I am pretty confident in saying that Linux isn't going to try to or really be able to meaningfully change its PATH_MAX in the forseeable future. Now we're firmly off into the weeds but since it's kind of an interesting debate: Honestly: who would *want* to ever interact with such long file paths? And I think a much better evolutionary direction is already in flight - operate in terms of directory fds with openat() etc. While it'd be logistically complicated one could imagine a variant of a Unix filesystem which hard required using openat() to access sub-domains where the paths in that sub-domain are limited to the existing PATH_MAX. I guess that kind of already exists in subvolumes/snapshots, and we're effectively enabling this with composefs too. > But I think it's a filesystem on-disk limitation, but if > i_size exceeds that, we return -EOPNOTSUPP or -EFSCORRUPTED? > For this symlink case, I tend to return -EFSCORRUPTED but > for other similar but complex cases, it could be hard to > decide. -EFSCORRUPTED is what XFS does at least (in xfs_inactive_symlink()).
On 2024/9/10 08:12, Colin Walters wrote: > > > On Mon, Sep 9, 2024, at 11:40 AM, Gao Xiang wrote: >> >> Just my personal opinion, my understanding of rubustness >> is stability and security. >> >> But whether to check or not check this, it doesn't crash >> the kernel or deadlock or livelock, so IMHO, it's already >> rubustness. > > OK, if you're telling me this is already checked elsewhere then I think we can call this end of thread. I know you ask for an explicit check on symlink i_size, but I've explained the current kernel behavior: - For symlink i_size < PAGE_SIZE (always >= 4096 on Linux), it behaves normally for EROFS Linux implementation; - For symlink i_size >= PAGE_SIZE, EROFS Linux implementation will mark '\0' at PAGE_SIZE - 1 in page_get_link() -> nd_terminate_link() so the behavior is also deterministic and not harmful to the system stability and security; In order to verify this, you could also check the EROFS image (encoded in gzip+base64) with a 16000-byte symlink file below: H4sICPqj32YCA3Rlc3QuZXJvZnMA7dsvSwNhHMDx350IBotgt0wMwsnegDCYb8FiVbu4LAom kygbgwURfR1Wm12Lf5JFTIpY9B58GCK2BYV9vvDhee64226sXPlFSBrXHu5f7k4u+isT9X46 GlHm8+9nt5tpHaxOxeS364+Wjh8PXtvP3bn9/nXvpjvq96fPfmpFLObjj7q07lzOxnq9Hubn eLvaapa/3N8YruVwP59+Rb54IYqoqqqzsd3xZ0uSJGnsSy/GAAAAAAAAAAAAAADA/5bmb89P I3aXv+YBiuzn/O3azF6zMD8AAADAHzHBP1qf7k2HRABQAAA= Here the symlink is already recorded in the image (let's not think if the symlink is reasonable or not for now), but Linux implementation will just truncate it as a 4095-byte link path, that is the current release behavior and it has no security issue inside. In other words, currently i_size >= PAGE_SIZE is an undefined behavior but Linux just truncates the link path. So I understand that what you propose here is to check the size explicitly, which means we need to introduce a formal on-disk hard limitation, for example: - Introduce EROFS_SYMLINK_MAXLEN as 4095; - For symlink i_size < EROFS_SYMLINK_MAXLEN, it behaves normally; - For symlink i_size >= EROFS_SYMLINK_MAXLEN, it just return -EFSCORRUPTED to refuse such inodes; So that is an added limitation (just like a "don't care" bit into a "meaningful" bit). For this case, to be clear I'm totally fine with the limitation, but I need to decide whether I should make "EROFS_SYMLINK_MAXLEN" as 4095 or "EROFS_SYMLINK_MAXLEN" as 4096 but also accepts `link[4095] == '\0'`. No matter which is finalled selected, it's a new hard on-disk limitation, which means we cannot change the limitation anymore (-EFSCORRUPTED) unless some strong reason as a bugfix. > >> Actually, I think EROFS for i_size > PAGE_SIZE, it's an >> undefined or reserved behavior for now (just like CPU >> reserved bits or don't care bits), just Linux >> implementation treats it with PAGE_SIZE-1 trailing '\0', >> but using erofs dump tool you could still dump large >> symlinks. >> >> Since PATH_MAX is a system-defined constant too, currently >> Linux PATH_MAX is 4096, but how about other OSes? I've >> seen some `PATH_MAX 8192` reference but I'm not sure which >> OS uses this setting. > > Famously GNU Hurd tried to avoid having a PATH_MAX at all: > https://www.gnu.org/software/hurd/community/gsoc/project_ideas/maxpath.html > > But I am pretty confident in saying that Linux isn't going to try to or really be able to meaningfully change its PATH_MAX in the forseeable future. > > Now we're firmly off into the weeds but since it's kind of an interesting debate: Honestly: who would *want* to ever interact with such long file paths? And I think a much better evolutionary direction is already in flight - operate in terms of directory fds with openat() etc. While it'd be logistically complicated one could imagine a variant of a Unix filesystem which hard required using openat() to access sub-domains where the paths in that sub-domain are limited to the existing PATH_MAX. I guess that kind of already exists in subvolumes/snapshots, and we're effectively enabling this with composefs too. Yes, but that is just off-topic. I just need to confirm that `PATH_MAX >= 4096 is absolutely nonsense for all OSes on earth`. If there is a path larger than 4096 in some OS, we maybe (just maybe) run into an awkward situation: EROFS can have some limitation to be used as an _archive format_ on this OS. Similar to EXT4->XFS, if XFS is used as an _archive format_ there could be a possiability that a EXT4 symlink cannot be represented correctly according to its on-disk format. So as an _archive format_, XFS is more limited now (Please don't misunderstand, I'm not saying XFS is not a great filesystem. I like XFS.) Thanks, Gao Xiang > >> But I think it's a filesystem on-disk limitation, but if >> i_size exceeds that, we return -EOPNOTSUPP or -EFSCORRUPTED? >> For this symlink case, I tend to return -EFSCORRUPTED but >> for other similar but complex cases, it could be hard to >> decide. > > -EFSCORRUPTED is what XFS does at least (in xfs_inactive_symlink()).
On Mon, Sep 9, 2024, at 10:18 PM, Gao Xiang wrote: > I know you ask for an explicit check on symlink i_size, but > I've explained the current kernel behavior: > - For symlink i_size < PAGE_SIZE (always >= 4096 on Linux), > it behaves normally for EROFS Linux implementation; > > - For symlink i_size >= PAGE_SIZE, EROFS Linux > implementation will mark '\0' at PAGE_SIZE - 1 in > page_get_link() -> nd_terminate_link() so the behavior is also > deterministic and not harmful to the system stability and security; Got it, OK. > In other words, currently i_size >= PAGE_SIZE is an undefined behavior > but Linux just truncates the link path. I think where we had a miscommunication is that when I see "undefined behavior" I thought you were using the formal term: https://en.wikipedia.org/wiki/Undefined_behavior The term for what you're talking about in my experience is usually "unspecified behavior" or "implementation defined behavior" which (assuming a reasonable implementor) would include silent truncation or an explicit error, but *not* walking off the end of a buffer and writing to arbitrary other kernel memory etc. (Hmm really given the widespread use of nd_terminate_link I guess this is kind of more of a "Linux convention" than just an EROFS one, with XFS as a notable exception?) > For this case, to be clear I'm totally fine with the limitation, > but I need to decide whether I should make "EROFS_SYMLINK_MAXLEN" > as 4095 or "EROFS_SYMLINK_MAXLEN" as 4096 but also accepts > `link[4095] == '\0'`. Mmmm...I think PATH_MAX is conventionally taken to include the NUL; yeah see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c?id=b40c8e7a033ff2cafd33adbe50e2a516f88fa223#n123
On 2024/9/11 04:51, Colin Walters wrote: > > > On Mon, Sep 9, 2024, at 10:18 PM, Gao Xiang wrote: > >> I know you ask for an explicit check on symlink i_size, but >> I've explained the current kernel behavior: >> - For symlink i_size < PAGE_SIZE (always >= 4096 on Linux), >> it behaves normally for EROFS Linux implementation; >> >> - For symlink i_size >= PAGE_SIZE, EROFS Linux >> implementation will mark '\0' at PAGE_SIZE - 1 in >> page_get_link() -> nd_terminate_link() so the behavior is also >> deterministic and not harmful to the system stability and security; > > Got it, OK. > >> In other words, currently i_size >= PAGE_SIZE is an undefined behavior >> but Linux just truncates the link path. > > I think where we had a miscommunication is that when I see "undefined behavior" I thought you were using the formal term: https://en.wikipedia.org/wiki/Undefined_behavior > > The term for what you're talking about in my experience is usually "unspecified behavior" or "implementation defined behavior" which (assuming a reasonable implementor) would include silent truncation or an explicit error, but *not* walking off the end of a buffer and writing to arbitrary other kernel memory etc. Yeah, agreed. "implementation defined behavior" sounds a better term. Sorry about my limited English corpus, because the environment I'm living mostly is used to professional terms translated in Chinese.. > > (Hmm really given the widespread use of nd_terminate_link I guess this is kind of more of a "Linux convention" than just an EROFS one, with XFS as a notable exception?) I'm not sure if other kernel fses have their own internal issues (so they need to check i_size > PAGE_SIZE to cover up their own format design in advance), but I think (and tested with crafted images) EROFS with pure only Linux VFS nd_terminate_link() implementation (since 2.6.x era) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebd09abbd9699f328165aee50a070403fbf55a37 is already safe on i_size > PAGE_SIZE since EROFS symlink on-disk format is just like its regular inode format. As for XFS, I think it's a history on-disk behavior (1024-byte hard limitation) so they have to follow until now, see the related commit message: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6eb0b8df9f74f33d1a69100117630a7a87a9cc96 > >> For this case, to be clear I'm totally fine with the limitation, >> but I need to decide whether I should make "EROFS_SYMLINK_MAXLEN" >> as 4095 or "EROFS_SYMLINK_MAXLEN" as 4096 but also accepts >> `link[4095] == '\0'`. > > Mmmm...I think PATH_MAX is conventionally taken to include the NUL; yeah see > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c?id=b40c8e7a033ff2cafd33adbe50e2a516f88fa223#n123 Agreed, but honestly I have some concern if some OS or tar format or other popular archive formats support large symlinks but EROFS have no way to keep them due to on-disk limitation. If you don't have some strong opinion on this, I do hope let's hold off our decision about this to ensure compatibility. Thanks, Gao Xiang
© 2016 - 2024 Red Hat, Inc.