The batadv_hardif_list list is checked in many situations and the items in this list are given to specialized functions to modify the routing behavior. At the moment each of these called functions has to check itself whether the received batadv_hard_iface has a refcount > 0 before it can increase the reference counter and use it in other objects.
This can easily lead to problems because it is not easily visible where all callers of a function got the batadv_hard_iface object from and whether they already hold a valid reference.
Checking the reference counter directly before calling a subfunction with a pointer from the batadv_hardif_list avoids this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/bat_iv_ogm.c | 11 +++++++++++ net/batman-adv/bat_v_ogm.c | 14 +++++++++++++- net/batman-adv/originator.c | 5 +++++ net/batman-adv/send.c | 6 ++++++ 4 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 2c65668..b390ff9 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -985,9 +985,15 @@ static void batadv_iv_ogm_schedule(struct batadv_hard_iface *hard_iface) list_for_each_entry_rcu(tmp_hard_iface, &batadv_hardif_list, list) { if (tmp_hard_iface->soft_iface != hard_iface->soft_iface) continue; + + if (!kref_get_unless_zero(&tmp_hard_iface->refcount)) + continue; + batadv_iv_ogm_queue_add(bat_priv, *ogm_buff, *ogm_buff_len, hard_iface, tmp_hard_iface, 1, send_time); + + batadv_hardif_put(tmp_hard_iface); } rcu_read_unlock();
@@ -1767,8 +1773,13 @@ static void batadv_iv_ogm_process(const struct sk_buff *skb, int ogm_offset, if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
+ if (!kref_get_unless_zero(&hard_iface->refcount)) + continue; + batadv_iv_ogm_process_per_outif(skb, ogm_offset, orig_node, if_incoming, hard_iface); + + batadv_hardif_put(hard_iface); } rcu_read_unlock();
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index 4155fa5..473ebb9 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -26,6 +26,7 @@ #include <linux/if_ether.h> #include <linux/jiffies.h> #include <linux/kernel.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/netdevice.h> #include <linux/random.h> @@ -176,6 +177,9 @@ static void batadv_v_ogm_send(struct work_struct *work) if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
+ if (!kref_get_unless_zero(&hard_iface->refcount)) + continue; + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Sending own OGM2 packet (originator %pM, seqno %u, throughput %u, TTL %d) on interface %s [%pM]\n", ogm_packet->orig, ntohl(ogm_packet->seqno), @@ -185,10 +189,13 @@ static void batadv_v_ogm_send(struct work_struct *work)
/* this skb gets consumed by batadv_v_ogm_send_to_if() */ skb_tmp = skb_clone(skb, GFP_ATOMIC); - if (!skb_tmp) + if (!skb_tmp) { + batadv_hardif_put(hard_iface); break; + }
batadv_v_ogm_send_to_if(skb_tmp, hard_iface); + batadv_hardif_put(hard_iface); } rcu_read_unlock();
@@ -704,9 +711,14 @@ static void batadv_v_ogm_process(const struct sk_buff *skb, int ogm_offset, if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
+ if (!kref_get_unless_zero(&hard_iface->refcount)) + continue; + batadv_v_ogm_process_per_outif(bat_priv, ethhdr, ogm_packet, orig_node, neigh_node, if_incoming, hard_iface); + + batadv_hardif_put(hard_iface); } rcu_read_unlock(); out: diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index e63d6a5..44c508a 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -1165,6 +1165,9 @@ static bool batadv_purge_orig_node(struct batadv_priv *bat_priv, if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
+ if (!kref_get_unless_zero(&hard_iface->refcount)) + continue; + best_neigh_node = batadv_find_best_neighbor(bat_priv, orig_node, hard_iface); @@ -1172,6 +1175,8 @@ static bool batadv_purge_orig_node(struct batadv_priv *bat_priv, best_neigh_node); if (best_neigh_node) batadv_neigh_node_put(best_neigh_node); + + batadv_hardif_put(hard_iface); } rcu_read_unlock();
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index 3ce06e0..43a950e 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -26,6 +26,7 @@ #include <linux/if.h> #include <linux/jiffies.h> #include <linux/kernel.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/netdevice.h> #include <linux/printk.h> @@ -577,10 +578,15 @@ static void batadv_send_outstanding_bcast_packet(struct work_struct *work) if (forw_packet->num_packets >= hard_iface->num_bcasts) continue;
+ if (!kref_get_unless_zero(&hard_iface->refcount)) + continue; + /* send a copy of the saved skb */ skb1 = skb_clone(forw_packet->skb, GFP_ATOMIC); if (skb1) batadv_send_broadcast_skb(skb1, hard_iface); + + batadv_hardif_put(hard_iface); } rcu_read_unlock();
The receive function may start processing an incoming packet while the hard_iface is shut down in a different context. All called functions called with the batadv_hard_iface object belonging to the incoming interface would have to check whether the reference counter is still > 0.
This is rather error-prone because this check can be forgotten easily. Instead check the reference counter when receiving the object to make sure that all called functions have a valid reference.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/main.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c index d64ddb9..bda5f13 100644 --- a/net/batman-adv/main.c +++ b/net/batman-adv/main.c @@ -401,11 +401,19 @@ int batadv_batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
hard_iface = container_of(ptype, struct batadv_hard_iface, batman_adv_ptype); + + /* don't receive packet for interface which gets shut down by batman-adv + * or otherwise it may gets tried to be referenced again in other + * structures like batadv_forw_packet + */ + if (!kref_get_unless_zero(&hard_iface->refcount)) + goto err_out; + skb = skb_share_check(skb, GFP_ATOMIC);
/* skb was released by skb_share_check() */ if (!skb) - goto err_out; + goto err_put;
/* packet should hold at least type and version */ if (unlikely(!pskb_may_pull(skb, 2))) @@ -448,6 +456,8 @@ int batadv_batman_skb_recv(struct sk_buff *skb, struct net_device *dev, if (ret == NET_RX_DROP) kfree_skb(skb);
+ batadv_hardif_put(hard_iface); + /* return NET_RX_SUCCESS in any case as we * most probably dropped the packet for * routing-logical reasons. @@ -456,6 +466,8 @@ int batadv_batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
err_free: kfree_skb(skb); +err_put: + batadv_hardif_put(hard_iface); err_out: return NET_RX_DROP; }
On Saturday, March 05, 2016 16:09:17 Sven Eckelmann wrote:
The receive function may start processing an incoming packet while the hard_iface is shut down in a different context. All called functions called with the batadv_hard_iface object belonging to the incoming interface would have to check whether the reference counter is still > 0.
This is rather error-prone because this check can be forgotten easily. Instead check the reference counter when receiving the object to make sure that all called functions have a valid reference.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/main.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
Applied in revision fee73a8.
Thanks, Marek
The hard_iface is referenced in the packet_type for batman-adv. Increase the refcounter of the hard_interface for it to have an explicit reference for it in case this functionality gets refactorted and the currently used implicit reference for it will be removed.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/hard-interface.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index 3240a67..56b148a 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -519,6 +519,7 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface, goto err_upper; }
+ kref_get(&hard_iface->refcount); hard_iface->batman_adv_ptype.type = ethertype; hard_iface->batman_adv_ptype.func = batadv_batman_skb_recv; hard_iface->batman_adv_ptype.dev = hard_iface->net_dev; @@ -581,6 +582,7 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface, batadv_info(hard_iface->soft_iface, "Removing interface: %s\n", hard_iface->net_dev->name); dev_remove_pack(&hard_iface->batman_adv_ptype); + batadv_hardif_put(hard_iface);
bat_priv->num_ifaces--; batadv_orig_hash_del_if(hard_iface, bat_priv->num_ifaces);
On Saturday, March 05, 2016 16:09:18 Sven Eckelmann wrote:
The hard_iface is referenced in the packet_type for batman-adv. Increase the refcounter of the hard_interface for it to have an explicit reference for it in case this functionality gets refactorted and the currently used implicit reference for it will be removed.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/hard-interface.c | 2 ++ 1 file changed, 2 insertions(+)
Applied in revision feffc04.
Thanks, Marek
The callers of the functions using batadv_hard_iface objects have to make sure that they already hold a valid reference. The subfunctions don't have to check whether the reference counter is > 0 because this was already checked by the callers.
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- Andrew, this should be what you've reported in https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2016-March/014588.html and what I've added to the issue tracker at https://www.open-mesh.org/issues/244
net/batman-adv/bat_iv_ogm.c | 9 ++------- net/batman-adv/hard-interface.c | 7 +++---- net/batman-adv/originator.c | 30 +++++++----------------------- 3 files changed, 12 insertions(+), 34 deletions(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index b390ff9..d389dc6 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -679,12 +679,6 @@ static void batadv_iv_ogm_aggregate_new(const unsigned char *packet_buff, unsigned char *skb_buff; unsigned int skb_size;
- if (!kref_get_unless_zero(&if_incoming->refcount)) - return; - - if (!kref_get_unless_zero(&if_outgoing->refcount)) - goto out_free_incoming; - /* own packet should always be scheduled */ if (!own_packet) { if (!batadv_atomic_dec_not_zero(&bat_priv->batman_queue_left)) { @@ -716,6 +710,8 @@ static void batadv_iv_ogm_aggregate_new(const unsigned char *packet_buff, forw_packet_aggr->packet_len = packet_len; memcpy(skb_buff, packet_buff, packet_len);
+ kref_get(&if_incoming->refcount); + kref_get(&if_outgoing->refcount); forw_packet_aggr->own = own_packet; forw_packet_aggr->if_incoming = if_incoming; forw_packet_aggr->if_outgoing = if_outgoing; @@ -747,7 +743,6 @@ out_nomem: atomic_inc(&bat_priv->batman_queue_left); out_free_outgoing: batadv_hardif_put(if_outgoing); -out_free_incoming: batadv_hardif_put(if_incoming); }
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index 56b148a..e70d13f 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -236,8 +236,8 @@ static void batadv_primary_if_select(struct batadv_priv *bat_priv,
ASSERT_RTNL();
- if (new_hard_iface && !kref_get_unless_zero(&new_hard_iface->refcount)) - new_hard_iface = NULL; + if (new_hard_iface) + kref_get(&new_hard_iface->refcount);
curr_hard_iface = rcu_dereference_protected(bat_priv->primary_if, 1); rcu_assign_pointer(bat_priv->primary_if, new_hard_iface); @@ -464,8 +464,7 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface, if (hard_iface->if_status != BATADV_IF_NOT_IN_USE) goto out;
- if (!kref_get_unless_zero(&hard_iface->refcount)) - goto out; + kref_get(&hard_iface->refcount);
soft_iface = dev_get_by_name(&init_net, iface_name);
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 44c508a..2d288ea 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -381,12 +381,8 @@ batadv_orig_ifinfo_new(struct batadv_orig_node *orig_node, if (!orig_ifinfo) goto out;
- if (if_outgoing != BATADV_IF_DEFAULT && - !kref_get_unless_zero(&if_outgoing->refcount)) { - kfree(orig_ifinfo); - orig_ifinfo = NULL; - goto out; - } + if (if_outgoing != BATADV_IF_DEFAULT) + kref_get(&if_outgoing->refcount);
reset_time = jiffies - 1; reset_time -= msecs_to_jiffies(BATADV_RESET_PROTECTION_MS); @@ -462,11 +458,8 @@ batadv_neigh_ifinfo_new(struct batadv_neigh_node *neigh, if (!neigh_ifinfo) goto out;
- if (if_outgoing && !kref_get_unless_zero(&if_outgoing->refcount)) { - kfree(neigh_ifinfo); - neigh_ifinfo = NULL; - goto out; - } + if (if_outgoing) + kref_get(&if_outgoing->refcount);
INIT_HLIST_NODE(&neigh_ifinfo->list); kref_init(&neigh_ifinfo->refcount); @@ -539,15 +532,11 @@ batadv_hardif_neigh_create(struct batadv_hard_iface *hard_iface, if (hardif_neigh) goto out;
- if (!kref_get_unless_zero(&hard_iface->refcount)) - goto out; - hardif_neigh = kzalloc(sizeof(*hardif_neigh), GFP_ATOMIC); - if (!hardif_neigh) { - batadv_hardif_put(hard_iface); + if (!hardif_neigh) goto out; - }
+ kref_get(&hard_iface->refcount); INIT_HLIST_NODE(&hardif_neigh->list); ether_addr_copy(hardif_neigh->addr, neigh_addr); hardif_neigh->if_incoming = hard_iface; @@ -650,16 +639,11 @@ batadv_neigh_node_new(struct batadv_orig_node *orig_node, if (!neigh_node) goto out;
- if (!kref_get_unless_zero(&hard_iface->refcount)) { - kfree(neigh_node); - neigh_node = NULL; - goto out; - } - INIT_HLIST_NODE(&neigh_node->list); INIT_HLIST_HEAD(&neigh_node->ifinfo_list); spin_lock_init(&neigh_node->ifinfo_lock);
+ kref_get(&hard_iface->refcount); ether_addr_copy(neigh_node->addr, neigh_addr); neigh_node->if_incoming = hard_iface; neigh_node->orig_node = orig_node;
On Saturday, March 05, 2016 16:09:19 Sven Eckelmann wrote:
The callers of the functions using batadv_hard_iface objects have to make sure that they already hold a valid reference. The subfunctions don't have to check whether the reference counter is > 0 because this was already checked by the callers.
You say the callers have to make sure that valid references exist but do they or is this coming later or .. ?
--- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -679,12 +679,6 @@ static void batadv_iv_ogm_aggregate_new(const unsigned char *packet_buff, unsigned char *skb_buff; unsigned int skb_size;
- if (!kref_get_unless_zero(&if_incoming->refcount))
return;
- if (!kref_get_unless_zero(&if_outgoing->refcount))
goto out_free_incoming;
- /* own packet should always be scheduled */ if (!own_packet) { if (!batadv_atomic_dec_not_zero(&bat_priv->batman_queue_left)) {
@@ -716,6 +710,8 @@ static void batadv_iv_ogm_aggregate_new(const unsigned char *packet_buff, forw_packet_aggr->packet_len = packet_len; memcpy(skb_buff, packet_buff, packet_len);
- kref_get(&if_incoming->refcount);
- kref_get(&if_outgoing->refcount); forw_packet_aggr->own = own_packet; forw_packet_aggr->if_incoming = if_incoming; forw_packet_aggr->if_outgoing = if_outgoing;
@@ -747,7 +743,6 @@ out_nomem: atomic_inc(&bat_priv->batman_queue_left); out_free_outgoing: batadv_hardif_put(if_outgoing); -out_free_incoming: batadv_hardif_put(if_incoming); }
This introduces a refcount imbalance If I am not mistaken. At the beginning of the function we jump to 'out_free_outgoing'.
Cheers, Marek
On Monday 11 April 2016 18:13:09 Marek Lindner wrote:
On Saturday, March 05, 2016 16:09:19 Sven Eckelmann wrote:
The callers of the functions using batadv_hard_iface objects have to make sure that they already hold a valid reference. The subfunctions don't have to check whether the reference counter is > 0 because this was already checked by the callers.
You say the callers have to make sure that valid references exist but do they or is this coming later or .. ?
This was in
[PATCH 1/9] batman-adv: Check hard_iface refcnt before calling function [PATCH 2/9] batman-adv: Check hard_iface refcnt when receiving skb
Kind regards, Sven
batadv_tvlv_container_get requires that tvlv.container_list_lock is held by the caller. It is therefore not possible that an item in tvlv.container_list has an reference counter of 0 and is still in the list
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/main.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c index bda5f13..1245a58 100644 --- a/net/batman-adv/main.c +++ b/net/batman-adv/main.c @@ -748,9 +748,7 @@ static struct batadv_tvlv_container if (tvlv_tmp->tvlv_hdr.version != version) continue;
- if (!kref_get_unless_zero(&tvlv_tmp->refcount)) - continue; - + kref_get(&tvlv_tmp->refcount); tvlv = tvlv_tmp; break; }
On Saturday, March 05, 2016 16:09:20 Sven Eckelmann wrote:
batadv_tvlv_container_get requires that tvlv.container_list_lock is held by the caller. It is therefore not possible that an item in tvlv.container_list has an reference counter of 0 and is still in the list
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/main.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
Applied in revision 21a288f.
Thanks, Marek
batadv_nc_get_nc_node requires that the caller already has a valid reference for orig_neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/network-coding.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index 32f9fa1..4f7a558 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -856,8 +856,7 @@ static struct batadv_nc_node if (!nc_node) return NULL;
- if (!kref_get_unless_zero(&orig_neigh_node->refcount)) - goto free; + kref_get(&orig_neigh_node->refcount);
/* Initialize nc_node */ INIT_LIST_HEAD(&nc_node->list); @@ -884,10 +883,6 @@ static struct batadv_nc_node spin_unlock_bh(lock);
return nc_node; - -free: - kfree(nc_node); - return NULL; }
/**
On Saturday, March 05, 2016 16:09:21 Sven Eckelmann wrote:
batadv_nc_get_nc_node requires that the caller already has a valid reference for orig_neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/network-coding.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
Applied in revision e0fbb37.
Thanks, Marek
batadv_gw_select requires that the caller already has a valid reference for new_gw_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/gateway_client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c index c59aff5..bb1c4f3 100644 --- a/net/batman-adv/gateway_client.c +++ b/net/batman-adv/gateway_client.c @@ -135,8 +135,8 @@ static void batadv_gw_select(struct batadv_priv *bat_priv,
spin_lock_bh(&bat_priv->gw.list_lock);
- if (new_gw_node && !kref_get_unless_zero(&new_gw_node->refcount)) - new_gw_node = NULL; + if (new_gw_node) + kref_get(&new_gw_node->refcount);
curr_gw_node = rcu_dereference_protected(bat_priv->gw.curr_gw, 1); rcu_assign_pointer(bat_priv->gw.curr_gw, new_gw_node);
On Saturday, March 05, 2016 16:09:22 Sven Eckelmann wrote:
batadv_gw_select requires that the caller already has a valid reference for new_gw_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/gateway_client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Applied in revision e698321.
Thanks, Marek
batadv_gw_node_add requires that the caller already has a valid reference for orig_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- net/batman-adv/gateway_client.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c index bb1c4f3..5839c56 100644 --- a/net/batman-adv/gateway_client.c +++ b/net/batman-adv/gateway_client.c @@ -440,15 +440,11 @@ static void batadv_gw_node_add(struct batadv_priv *bat_priv, if (gateway->bandwidth_down == 0) return;
- if (!kref_get_unless_zero(&orig_node->refcount)) - return; - gw_node = kzalloc(sizeof(*gw_node), GFP_ATOMIC); - if (!gw_node) { - batadv_orig_node_put(orig_node); + if (!gw_node) return; - }
+ kref_get(&orig_node->refcount); INIT_HLIST_NODE(&gw_node->list); gw_node->orig_node = orig_node; gw_node->bandwidth_down = ntohl(gateway->bandwidth_down);
On Saturday, March 05, 2016 16:09:23 Sven Eckelmann wrote:
batadv_gw_node_add requires that the caller already has a valid reference for orig_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/gateway_client.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
Applied in revision 25a303d.
Thanks, Marek
_batadv_update_route requires that the caller already has a valid reference for neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- This patch requires https://patchwork.open-mesh.org/patch/15888/
net/batman-adv/routing.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index ffc4c09..acfd313 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -100,13 +100,12 @@ static void _batadv_update_route(struct batadv_priv *bat_priv, if (curr_router) batadv_neigh_node_put(curr_router);
- /* increase refcount of new best neighbor */ - if (neigh_node && !kref_get_unless_zero(&neigh_node->refcount)) - neigh_node = NULL;
spin_lock_bh(&orig_node->neigh_list_lock); curr_router = rcu_dereference_protected(orig_ifinfo->router, true);
+ if (neigh_node) + kref_get(&neigh_node->refcount); rcu_assign_pointer(orig_ifinfo->router, neigh_node); spin_unlock_bh(&orig_node->neigh_list_lock); batadv_orig_ifinfo_put(orig_ifinfo);
_batadv_update_route requires that the caller already has a valid reference for neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org --- This patch requires https://patchwork.open-mesh.org/patch/15888/
v2: - Remove one of the empty lines around the removed block
net/batman-adv/routing.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index ffc4c09..06581af 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -100,13 +100,11 @@ static void _batadv_update_route(struct batadv_priv *bat_priv, if (curr_router) batadv_neigh_node_put(curr_router);
- /* increase refcount of new best neighbor */ - if (neigh_node && !kref_get_unless_zero(&neigh_node->refcount)) - neigh_node = NULL; - spin_lock_bh(&orig_node->neigh_list_lock); curr_router = rcu_dereference_protected(orig_ifinfo->router, true);
+ if (neigh_node) + kref_get(&neigh_node->refcount); rcu_assign_pointer(orig_ifinfo->router, neigh_node); spin_unlock_bh(&orig_node->neigh_list_lock); batadv_orig_ifinfo_put(orig_ifinfo);
On Saturday, March 05, 2016 19:05:24 Sven Eckelmann wrote:
_batadv_update_route requires that the caller already has a valid reference for neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function
The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
This patch requires https://patchwork.open-mesh.org/patch/15888/
v2:
Remove one of the empty lines around the removed block
net/batman-adv/routing.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
Applied in revision 4390308.
Thanks, Marek
Hi Sven
Thanks for the patches. I will test them out of Monday and let you know if it fixes the crash.
Andrew
On Sat, Mar 05, 2016 at 04:09:16PM +0100, Sven Eckelmann wrote:
The batadv_hardif_list list is checked in many situations and the items in this list are given to specialized functions to modify the routing behavior. At the moment each of these called functions has to check itself whether the received batadv_hard_iface has a refcount > 0 before it can increase the reference counter and use it in other objects.
This can easily lead to problems because it is not easily visible where all callers of a function got the batadv_hard_iface object from and whether they already hold a valid reference.
Checking the reference counter directly before calling a subfunction with a pointer from the batadv_hardif_list avoids this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/bat_iv_ogm.c | 11 +++++++++++ net/batman-adv/bat_v_ogm.c | 14 +++++++++++++- net/batman-adv/originator.c | 5 +++++ net/batman-adv/send.c | 6 ++++++ 4 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 2c65668..b390ff9 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -985,9 +985,15 @@ static void batadv_iv_ogm_schedule(struct batadv_hard_iface *hard_iface) list_for_each_entry_rcu(tmp_hard_iface, &batadv_hardif_list, list) { if (tmp_hard_iface->soft_iface != hard_iface->soft_iface) continue;
if (!kref_get_unless_zero(&tmp_hard_iface->refcount))
continue;
- batadv_iv_ogm_queue_add(bat_priv, *ogm_buff, *ogm_buff_len, hard_iface, tmp_hard_iface, 1, send_time);
} rcu_read_unlock();batadv_hardif_put(tmp_hard_iface);
@@ -1767,8 +1773,13 @@ static void batadv_iv_ogm_process(const struct sk_buff *skb, int ogm_offset, if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
if (!kref_get_unless_zero(&hard_iface->refcount))
continue;
- batadv_iv_ogm_process_per_outif(skb, ogm_offset, orig_node, if_incoming, hard_iface);
} rcu_read_unlock();batadv_hardif_put(hard_iface);
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index 4155fa5..473ebb9 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -26,6 +26,7 @@ #include <linux/if_ether.h> #include <linux/jiffies.h> #include <linux/kernel.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/netdevice.h> #include <linux/random.h> @@ -176,6 +177,9 @@ static void batadv_v_ogm_send(struct work_struct *work) if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
if (!kref_get_unless_zero(&hard_iface->refcount))
continue;
- batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Sending own OGM2 packet (originator %pM, seqno %u, throughput %u, TTL %d) on interface %s [%pM]\n", ogm_packet->orig, ntohl(ogm_packet->seqno),
@@ -185,10 +189,13 @@ static void batadv_v_ogm_send(struct work_struct *work)
/* this skb gets consumed by batadv_v_ogm_send_to_if() */ skb_tmp = skb_clone(skb, GFP_ATOMIC);
if (!skb_tmp)
if (!skb_tmp) {
batadv_hardif_put(hard_iface); break;
}
batadv_v_ogm_send_to_if(skb_tmp, hard_iface);
batadv_hardif_put(hard_iface);
} rcu_read_unlock();
@@ -704,9 +711,14 @@ static void batadv_v_ogm_process(const struct sk_buff *skb, int ogm_offset, if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
if (!kref_get_unless_zero(&hard_iface->refcount))
continue;
- batadv_v_ogm_process_per_outif(bat_priv, ethhdr, ogm_packet, orig_node, neigh_node, if_incoming, hard_iface);
} rcu_read_unlock();batadv_hardif_put(hard_iface);
out: diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index e63d6a5..44c508a 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -1165,6 +1165,9 @@ static bool batadv_purge_orig_node(struct batadv_priv *bat_priv, if (hard_iface->soft_iface != bat_priv->soft_iface) continue;
if (!kref_get_unless_zero(&hard_iface->refcount))
continue;
- best_neigh_node = batadv_find_best_neighbor(bat_priv, orig_node, hard_iface);
@@ -1172,6 +1175,8 @@ static bool batadv_purge_orig_node(struct batadv_priv *bat_priv, best_neigh_node); if (best_neigh_node) batadv_neigh_node_put(best_neigh_node);
} rcu_read_unlock();batadv_hardif_put(hard_iface);
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index 3ce06e0..43a950e 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -26,6 +26,7 @@ #include <linux/if.h> #include <linux/jiffies.h> #include <linux/kernel.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/netdevice.h> #include <linux/printk.h> @@ -577,10 +578,15 @@ static void batadv_send_outstanding_bcast_packet(struct work_struct *work) if (forw_packet->num_packets >= hard_iface->num_bcasts) continue;
if (!kref_get_unless_zero(&hard_iface->refcount))
continue;
- /* send a copy of the saved skb */ skb1 = skb_clone(forw_packet->skb, GFP_ATOMIC); if (skb1) batadv_send_broadcast_skb(skb1, hard_iface);
} rcu_read_unlock();batadv_hardif_put(hard_iface);
-- 2.7.0
On Sunday 06 March 2016 01:08:23 Andrew Lunn wrote:
Hi Sven
Thanks for the patches. I will test them out of Monday and let you know if it fixes the crash.
The ones I've Cc'ed you are not really about fixing the crash. They are just to clean up the reference counting (which you've complained about). You can test them but I would doubt that they fix the problem. But they might help you to detect situation when the refcounter gets 0 but shouldn't.
There are also some related problems which I've already mentioned on the mailing list 9 months ago and never got an answer. I've moved them now to the issue tracker in hope that the authors of the feature will now reply. The unchecked double *_put with [2,3] together with the unchecked double list_adds [1] are a good way to mess up the reference counting of several objects (including the hard interfaces).
You may also want to add a real reference counting fix which I've also posted yesterday (but for non-hard-interface objects) [4].
And a small remark about your x86 without serial port. Two systems connected via usb-serial+null-model-cable+usb-serial work quite well for this type of tests. Add "console=tty0 console=ttyUSB0,115200,n8" to the bootargs of your test system and boot it up. The system used for gathering the kernel output just has to run "screen /dev/ttyUSB0 115200" and dump all the data.
Just a suggestion in case you didn't think about using this kind of setup.
Thanks for debugging this problem and have a nice weekend.
Kind regards, Sven
[1] https://www.open-mesh.org/issues/243 [2] https://www.open-mesh.org/issues/242 [3] https://www.open-mesh.org/issues/235 [4] https://patchwork.open-mesh.org/patch/15888/
And a small remark about your x86 without serial port. Two systems connected via usb-serial+null-model-cable+usb-serial work quite well for this type of tests. Add "console=tty0 console=ttyUSB0,115200,n8" to the bootargs of your test system and boot it up. The system used for gathering the kernel output just has to run "screen /dev/ttyUSB0 115200" and dump all the data.
Hi Sven
The box does have a couple of FTDI chips. But i've had mixed results with such a setup. Often the Opps does not get out of the USB subsystem before the box is dead. I can give it a try and see what happens in this case.
Andrew
On Saturday, March 05, 2016 16:09:16 Sven Eckelmann wrote:
The batadv_hardif_list list is checked in many situations and the items in this list are given to specialized functions to modify the routing behavior. At the moment each of these called functions has to check itself whether the received batadv_hard_iface has a refcount > 0 before it can increase the reference counter and use it in other objects.
This can easily lead to problems because it is not easily visible where all callers of a function got the batadv_hard_iface object from and whether they already hold a valid reference.
Checking the reference counter directly before calling a subfunction with a pointer from the batadv_hardif_list avoids this problem.
Signed-off-by: Sven Eckelmann sven@narfation.org
net/batman-adv/bat_iv_ogm.c | 11 +++++++++++ net/batman-adv/bat_v_ogm.c | 14 +++++++++++++- net/batman-adv/originator.c | 5 +++++ net/batman-adv/send.c | 6 ++++++ 4 files changed, 35 insertions(+), 1 deletion(-)
Applied in revision bd2dcf5.
Thanks, Marek
b.a.t.m.a.n@lists.open-mesh.org