:p
atchew
Login
When looking at something else, I noticed that the local endpoint entries list was iterated under rcu_read_lock, but using list_for_each_entry() instead of the _rcu variant. That's what patch 1 is fixing. At the previous meeting, Mat and Christoph mentioned we should also use this _rcu variant in mptcp_pm_nl_set_flags(). But then the modifications look too important for -net, because __lookup_addr_by_id() also needs to be modified, and similar parts in the code as well to harmonise how entries are read from the list. That's what is done in patch 2, for -next then. Patch 3 is a simple change to remove duplicated code. Note: I see that we are using spin_lock_bh(), but the RCU read "locks" are always used without the _bh() variant. Is that OK here, or did we miss something? Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Changes in v2: - Add patch 2 and 3 - Patch 1: avoid > 80 chars per line in __lookup_addr_rcu() + update commit message. - Link to v1: https://lore.kernel.org/r/20241022-mptcp-pm-lookup_addr_rcu-v1-1-19d45f26c872@kernel.org --- Geliang Tang (1): mptcp: pm: avoid code duplication to lookup endp Matthieu Baerts (NGI0) (2): mptcp: pm: use _rcu variant under rcu_read_lock mptcp: pm: lockless list traversal net/mptcp/pm_netlink.c | 69 +++++++++++++++++++++++++++----------------------- 1 file changed, 38 insertions(+), 31 deletions(-) --- base-commit: c2990db510fcdde78645791def80551c0f77ff76 change-id: 20241022-mptcp-pm-lookup_addr_rcu-01833ea95155 Best regards, -- Matthieu Baerts (NGI0) <matttbe@kernel.org>
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are used as expected to iterate over the list of local addresses, but list_for_each_entry() was used instead of list_for_each_entry_rcu() in __lookup_addr() (and lookup_id_by_addr() before). It is important to use this variant which adds the required READ_ONCE() (and diagnostic checks if enabled). Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it is called under the pernet->lock, the _rcu variant cannot be used in all cases. A new helper is then created. Note that this new helper can be reused later to reduce some duplicated code elsewhere in this file, and some sections could be used lockless, also using this new helper then. But all of these extra modifications should probably be better considered as new improvements, and not as fixes. Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_netlink.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -XXX,XX +XXX,XX @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) return NULL; } +static struct mptcp_pm_addr_entry * +__lookup_addr_rcu(struct pm_nl_pernet *pernet, + const struct mptcp_addr_info *info) +{ + struct mptcp_pm_addr_entry *entry; + + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { + if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) + return entry; + } + return NULL; +} + static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { struct sock *sk = (struct sock *)msk; @@ -XXX,XX +XXX,XX @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); rcu_read_lock(); - entry = __lookup_addr(pernet, &mpc_addr); + entry = __lookup_addr_rcu(pernet, &mpc_addr); if (entry) { __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); msk->mpc_endpoint_id = entry->addr.id; -- 2.45.2
In a few places -- to get an endpoint, dump all of them, and change their flags -- the list is iterated while holding the pernet->lock, but only to read the content of the list. In these cases, we can replace the spin locks, by RCU read ones, and use the _rcu variants to iterate over the entries list in a lockless way. To make it clear, the lookup helpers using the _rcu variant are renamed with a _rcu suffix. The previous __lookup_addr() helper can then be removed, but __lookup_addr_by_id() is still needed. While at it, the IDs bitmap is copied before iterating the list to dump the different addresses, to avoid any consistencies. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Notes: - This is not a fix, a small improvement for -next. --- net/mptcp/pm_netlink.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -XXX,XX +XXX,XX @@ __lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) } static struct mptcp_pm_addr_entry * -__lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) +__lookup_addr_by_id_rcu(struct pm_nl_pernet *pernet, unsigned int id) { struct mptcp_pm_addr_entry *entry; - list_for_each_entry(entry, &pernet->local_addr_list, list) { - if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { + if (entry->addr.id == id) return entry; } return NULL; @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_get_addr(struct sk_buff *skb, struct genl_info *info) goto fail; } - spin_lock_bh(&pernet->lock); - entry = __lookup_addr_by_id(pernet, addr.addr.id); + rcu_read_lock(); + entry = __lookup_addr_by_id_rcu(pernet, addr.addr.id); if (!entry) { GENL_SET_ERR_MSG(info, "address not found"); ret = -EINVAL; @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_get_addr(struct sk_buff *skb, struct genl_info *info) genlmsg_end(msg, reply); ret = genlmsg_reply(msg, info); - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); return ret; unlock_fail: - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); fail: nlmsg_free(msg); @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_dump_addr(struct sk_buff *msg, struct net *net = sock_net(msg->sk); struct mptcp_pm_addr_entry *entry; struct pm_nl_pernet *pernet; + unsigned long id_bitmap[4]; int id = cb->args[0]; void *hdr; int i; pernet = pm_nl_get_pernet(net); + bitmap_copy(id_bitmap, pernet->id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); - spin_lock_bh(&pernet->lock); + rcu_read_lock(); for (i = id; i < MPTCP_PM_MAX_ADDR_ID + 1; i++) { - if (test_bit(i, pernet->id_bitmap)) { - entry = __lookup_addr_by_id(pernet, i); + if (test_bit(i, id_bitmap)) { + entry = __lookup_addr_by_id_rcu(pernet, i); if (!entry) break; @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_dump_addr(struct sk_buff *msg, genlmsg_end(msg, hdr); } } - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); cb->args[0] = id; return msg->len; @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_set_flags(struct sk_buff *skb, struct genl_info *info) if (addr.flags & MPTCP_PM_ADDR_FLAG_BACKUP) bkup = 1; - spin_lock_bh(&pernet->lock); - entry = lookup_by_id ? __lookup_addr_by_id(pernet, addr.addr.id) : - __lookup_addr(pernet, &addr.addr); + rcu_read_lock(); + entry = lookup_by_id ? __lookup_addr_by_id_rcu(pernet, addr.addr.id) : + __lookup_addr_rcu(pernet, &addr.addr); if (!entry) { - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); GENL_SET_ERR_MSG(info, "address not found"); return -EINVAL; } if ((addr.flags & MPTCP_PM_ADDR_FLAG_FULLMESH) && (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) { - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); GENL_SET_ERR_MSG(info, "invalid addr flags"); return -EINVAL; } @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_set_flags(struct sk_buff *skb, struct genl_info *info) changed = (addr.flags ^ entry->flags) & mask; entry->flags = (entry->flags & ~mask) | (addr.flags & mask); addr = *entry; - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); mptcp_nl_set_flags(net, &addr.addr, bkup, changed); return 0; -- 2.45.2
From: Geliang Tang <tanggeliang@kylinos.cn> The helper __lookup_addr_rcu() can be used in mptcp_pm_nl_get_local_id() and mptcp_pm_nl_is_backup() to simplify the code, and avoid code duplication. Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> --- Notes: - This is also for -next. --- net/mptcp/pm_netlink.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc { struct mptcp_pm_addr_entry *entry; struct pm_nl_pernet *pernet; - int ret = -1; + int ret; pernet = pm_nl_get_pernet_from_msk(msk); rcu_read_lock(); - list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (mptcp_addresses_equal(&entry->addr, skc, entry->addr.port)) { - ret = entry->addr.id; - break; - } - } + entry = __lookup_addr_rcu(pernet, skc); + ret = entry ? entry->addr.id : -1; rcu_read_unlock(); if (ret >= 0) return ret; @@ -XXX,XX +XXX,XX @@ bool mptcp_pm_nl_is_backup(struct mptcp_sock *msk, struct mptcp_addr_info *skc) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct mptcp_pm_addr_entry *entry; - bool backup = false; + bool backup; rcu_read_lock(); - list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (mptcp_addresses_equal(&entry->addr, skc, entry->addr.port)) { - backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); - break; - } - } + entry = __lookup_addr_rcu(pernet, skc); + backup = entry && !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); rcu_read_unlock(); return backup; -- 2.45.2
When looking at something else, I noticed that the local endpoint entries list was iterated under rcu_read_lock, but using list_for_each_entry() instead of the _rcu variant. That's what patch 1 is fixing. At the previous meeting, Mat and Christoph mentioned we could use the RCU variant elsewhere. That's what is done in patch 2, for -next then. Patch 3 is a simple change to remove duplicated code. Note: I see that we are using spin_lock_bh(), but the RCU read "locks" are always used without the _bh() variant. Is that OK here, or did we miss something? Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Changes in v3: - Use list_for_each_entry_rcu() with a 4th parameter: lockdep_is_held() - See individual changelog in the different patches. - Link to v2: https://lore.kernel.org/r/20241025-mptcp-pm-lookup_addr_rcu-v2-0-1478f6c4b205@kernel.org Changes in v2: - Add patch 2 and 3 - Patch 1: avoid > 80 chars per line in __lookup_addr_rcu() + update commit message. - Link to v1: https://lore.kernel.org/r/20241022-mptcp-pm-lookup_addr_rcu-v1-1-19d45f26c872@kernel.org --- Geliang Tang (1): mptcp: pm: avoid code duplication to lookup endp Matthieu Baerts (NGI0) (2): mptcp: pm: use _rcu variant under rcu_read_lock mptcp: pm: lockless list traversal to dump endpoints net/mptcp/pm_netlink.c | 36 +++++++++++++++--------------------- 1 file changed, 15 insertions(+), 21 deletions(-) --- base-commit: ca26062fcd85c61922f543674c5dd0382e2059cd change-id: 20241022-mptcp-pm-lookup_addr_rcu-01833ea95155 Best regards, -- Matthieu Baerts (NGI0) <matttbe@kernel.org>
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are used as expected to iterate over the list of local addresses, but list_for_each_entry() was used instead of list_for_each_entry_rcu() in __lookup_addr(). It is important to use this variant which adds the required READ_ONCE() (and diagnostic checks if enabled). Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it is called under the pernet->lock and not rcu_read_lock(), an extra condition is then passed to help the diagnostic checks making sure either the associated spin lock or the RCU lock is held. Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- v3: - Use list_for_each_entry_rcu() with lockdep_is_held(). - Update commit message accordingly. --- net/mptcp/pm_netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -XXX,XX +XXX,XX @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) { struct mptcp_pm_addr_entry *entry; - list_for_each_entry(entry, &pernet->local_addr_list, list) { + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list, + lockdep_is_held(&pernet->lock)) { if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) return entry; } -- 2.45.2
To return an endpoint to the userspace via Netlink, and to dump all of them, the endpoint list was iterated while holding the pernet->lock, but only to read the content of the list. In these cases, the spin locks can be replaced by RCU read ones, and use the _rcu variants to iterate over the entries list in a lockless way. Note that the __lookup_addr_by_id() helper has been modified to use the _rcu variants of list_for_each_entry(), but with an extra conditions, so it can be called either while the RCU read lock is held, or when the associated pernet->lock is held. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Notes: - This is not a fix, a small improvement for -next. - v3: - Use list_for_each_entry_rcu() with lockdep_is_held() instead of duplicating the helper. So no need to rename the helper with _rcu suffix. - No copy of the bitmap in dump_addr() (Mat) - No need to modify set flags. (Paolo) --- net/mptcp/pm_netlink.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -XXX,XX +XXX,XX @@ __lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) { struct mptcp_pm_addr_entry *entry; - list_for_each_entry(entry, &pernet->local_addr_list, list) { + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list, + lockdep_is_held(&pernet->lock)) { if (entry->addr.id == id) return entry; } @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_get_addr(struct sk_buff *skb, struct genl_info *info) goto fail; } - spin_lock_bh(&pernet->lock); + rcu_read_lock(); entry = __lookup_addr_by_id(pernet, addr.addr.id); if (!entry) { GENL_SET_ERR_MSG(info, "address not found"); @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_get_addr(struct sk_buff *skb, struct genl_info *info) genlmsg_end(msg, reply); ret = genlmsg_reply(msg, info); - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); return ret; unlock_fail: - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); fail: nlmsg_free(msg); @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_dump_addr(struct sk_buff *msg, pernet = pm_nl_get_pernet(net); - spin_lock_bh(&pernet->lock); + rcu_read_lock(); for (i = id; i < MPTCP_PM_MAX_ADDR_ID + 1; i++) { if (test_bit(i, pernet->id_bitmap)) { entry = __lookup_addr_by_id(pernet, i); @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_dump_addr(struct sk_buff *msg, genlmsg_end(msg, hdr); } } - spin_unlock_bh(&pernet->lock); + rcu_read_unlock(); cb->args[0] = id; return msg->len; -- 2.45.2
From: Geliang Tang <tanggeliang@kylinos.cn> The helper __lookup_addr() can be used in mptcp_pm_nl_get_local_id() and mptcp_pm_nl_is_backup() to simplify the code, and avoid code duplication. Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> --- Notes: - This is also for -next. - v3: - use __lookup_addr() instead of __lookup_addr_rcu() that has been removed in v3. --- net/mptcp/pm_netlink.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc { struct mptcp_pm_addr_entry *entry; struct pm_nl_pernet *pernet; - int ret = -1; + int ret; pernet = pm_nl_get_pernet_from_msk(msk); rcu_read_lock(); - list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (mptcp_addresses_equal(&entry->addr, skc, entry->addr.port)) { - ret = entry->addr.id; - break; - } - } + entry = __lookup_addr(pernet, skc); + ret = entry ? entry->addr.id : -1; rcu_read_unlock(); if (ret >= 0) return ret; @@ -XXX,XX +XXX,XX @@ bool mptcp_pm_nl_is_backup(struct mptcp_sock *msk, struct mptcp_addr_info *skc) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct mptcp_pm_addr_entry *entry; - bool backup = false; + bool backup; rcu_read_lock(); - list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (mptcp_addresses_equal(&entry->addr, skc, entry->addr.port)) { - backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); - break; - } - } + entry = __lookup_addr(pernet, skc); + backup = entry && !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); rcu_read_unlock(); return backup; -- 2.45.2