[PATCH v2] libceph: tolerate addrvecs with multiple entries of the same type

Kefu Chai posted 1 patch 3 days, 17 hours ago
net/ceph/decode.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
[PATCH v2] libceph: tolerate addrvecs with multiple entries of the same type
Posted by Kefu Chai 3 days, 17 hours ago
ceph_decode_entity_addrvec() rejects any addrvec containing more than
one entry that matches the requested msgr type (LEGACY or MSGR2),
logging "another match of type N in addrvec" and returning -EINVAL.

Some admin tooling (e.g. pveceph mon create from Proxmox VE) generates
addrvecs with multiple same-type entries when public_network lists more
than one CIDR: it picks one local IP per subnet and emits both a v2 and
a v1 entry for each IP.  Monmaps shaped this way cause:

  libceph: mon0 (1)10.10.10.15:6789 session established
  libceph: another match of type 1 in addrvec
  libceph: problem decoding monmap, -22

No Ceph code uses the extra entries: since Nautilus, the userspace
messenger (AsyncMessenger) unconditionally picks the first address of
the requested type and ignores any subsequent matches.

Match that behavior: use the first matching entry and silently skip any
subsequent ones.  This is a compatibility fix for existing deployments
and does not enable dual-stack or multi-subnet address selection.

Link: https://bugzilla.proxmox.com/show_bug.cgi?id=7518
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
---
Changes since v1:
- Rewrite commit message to frame as compatibility fix; drop dual-stack/
  multi-subnet framing and the two ceph tracker links
- Simplify comment in ceph_decode_entity_addrvec()

Tested by reproducing the Proxmox BZ 7518 scenario against a vstart
cluster whose mon addrvec was edited to contain two v1 + two v2 entries:

    ceph mon set-addrs a \
        '[v2:$ip1:$p2/0,v1:$ip1:$p1/0,v2:$ip2:$p2/0,v1:$ip2:$p1/0]'

A Debian VM booted with the patched kernel via 'qemu -kernel' then
ran 'mount -t ceph ...:$p1:/ /mnt -o name=admin'.  Pre-patch kernels
fail at monmap decode with "another match of type 1 in addrvec"
(-EINVAL).  Post-patch, decode succeeds and the mount proceeds to
the auth / MDS-discovery stages.

Also verified the decoder logic on the monmap.bin attached to BZ 7518
using a userspace port of ceph_decode_entity_addrvec(): the pre-patch
form returns -EINVAL on both msgr1 and msgr2 lookups; the post-patch
form returns 0 and picks the first matching entry.

 net/ceph/decode.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/net/ceph/decode.c b/net/ceph/decode.c
index bc109a1a4616..18f0a7c71950 100644
--- a/net/ceph/decode.c
+++ b/net/ceph/decode.c
@@ -87,8 +87,9 @@ ceph_decode_entity_addr(void **p, void *end, struct ceph_entity_addr *addr)
 EXPORT_SYMBOL(ceph_decode_entity_addr);
 
 /*
- * Return addr of desired type (MSGR2 or LEGACY) or error.
- * Make sure there is only one match.
+ * Return addr of desired type (MSGR2 or LEGACY) or error.  If multiple
+ * entries of the desired type are present, use the first one for
+ * compatibility with existing deployments.
  *
  * Assume encoding with MSG_ADDR2.
  */
@@ -120,13 +121,7 @@ int ceph_decode_entity_addrvec(void **p, void *end, bool msgr2,
 			return ret;
 
 		dout("%s i %d addr %s\n", __func__, i, ceph_pr_addr(&tmp_addr));
-		if (tmp_addr.type == my_type) {
-			if (found) {
-				pr_err("another match of type %d in addrvec\n",
-				       le32_to_cpu(my_type));
-				return -EINVAL;
-			}
-
+		if (tmp_addr.type == my_type && !found) {
 			memcpy(addr, &tmp_addr, sizeof(*addr));
 			found = true;
 		}
-- 
2.47.3
Re: [PATCH v2] libceph: tolerate addrvecs with multiple entries of the same type
Posted by Viacheslav Dubeyko 3 days, 12 hours ago
On Thu, 2026-06-04 at 22:02 +0800, Kefu Chai wrote:
> ceph_decode_entity_addrvec() rejects any addrvec containing more than
> one entry that matches the requested msgr type (LEGACY or MSGR2),
> logging "another match of type N in addrvec" and returning -EINVAL.
> 
> Some admin tooling (e.g. pveceph mon create from Proxmox VE)
> generates
> addrvecs with multiple same-type entries when public_network lists
> more
> than one CIDR: it picks one local IP per subnet and emits both a v2
> and
> a v1 entry for each IP.  Monmaps shaped this way cause:
> 
>   libceph: mon0 (1)10.10.10.15:6789 session established
>   libceph: another match of type 1 in addrvec
>   libceph: problem decoding monmap, -22
> 
> No Ceph code uses the extra entries: since Nautilus, the userspace
> messenger (AsyncMessenger) unconditionally picks the first address of
> the requested type and ignores any subsequent matches.
> 
> Match that behavior: use the first matching entry and silently skip
> any
> subsequent ones.  This is a compatibility fix for existing
> deployments
> and does not enable dual-stack or multi-subnet address selection.
> 
> Link: https://bugzilla.proxmox.com/show_bug.cgi?id=7518
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> Changes since v1:
> - Rewrite commit message to frame as compatibility fix; drop dual-
> stack/
>   multi-subnet framing and the two ceph tracker links
> - Simplify comment in ceph_decode_entity_addrvec()
> 
> Tested by reproducing the Proxmox BZ 7518 scenario against a vstart
> cluster whose mon addrvec was edited to contain two v1 + two v2
> entries:
> 
>     ceph mon set-addrs a \
>         '[v2:$ip1:$p2/0,v1:$ip1:$p1/0,v2:$ip2:$p2/0,v1:$ip2:$p1/0]'
> 
> A Debian VM booted with the patched kernel via 'qemu -kernel' then
> ran 'mount -t ceph ...:$p1:/ /mnt -o name=admin'.  Pre-patch kernels
> fail at monmap decode with "another match of type 1 in addrvec"
> (-EINVAL).  Post-patch, decode succeeds and the mount proceeds to
> the auth / MDS-discovery stages.
> 
> Also verified the decoder logic on the monmap.bin attached to BZ 7518
> using a userspace port of ceph_decode_entity_addrvec(): the pre-patch
> form returns -EINVAL on both msgr1 and msgr2 lookups; the post-patch
> form returns 0 and picks the first matching entry.
> 
>  net/ceph/decode.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/net/ceph/decode.c b/net/ceph/decode.c
> index bc109a1a4616..18f0a7c71950 100644
> --- a/net/ceph/decode.c
> +++ b/net/ceph/decode.c
> @@ -87,8 +87,9 @@ ceph_decode_entity_addr(void **p, void *end, struct
> ceph_entity_addr *addr)
>  EXPORT_SYMBOL(ceph_decode_entity_addr);
>  
>  /*
> - * Return addr of desired type (MSGR2 or LEGACY) or error.
> - * Make sure there is only one match.
> + * Return addr of desired type (MSGR2 or LEGACY) or error.  If
> multiple
> + * entries of the desired type are present, use the first one for
> + * compatibility with existing deployments.
>   *
>   * Assume encoding with MSG_ADDR2.
>   */
> @@ -120,13 +121,7 @@ int ceph_decode_entity_addrvec(void **p, void
> *end, bool msgr2,
>  			return ret;
>  
>  		dout("%s i %d addr %s\n", __func__, i,
> ceph_pr_addr(&tmp_addr));
> -		if (tmp_addr.type == my_type) {
> -			if (found) {
> -				pr_err("another match of type %d in
> addrvec\n",
> -				       le32_to_cpu(my_type));

Maybe, we still need to have some debugging output here? What's about
dout()?

Thanks,
Slava.

> -				return -EINVAL;
> -			}
> -
> +		if (tmp_addr.type == my_type && !found) {
>  			memcpy(addr, &tmp_addr, sizeof(*addr));
>  			found = true;
>  		}
> --
> 2.47.3