Hello David,
here you have three patches intended for net/linux-3.10.
Patch 1/3 is substituting a rtnl_trylock with a rtnl_lock. The trylock was used before to avoid deadlock, but since this situation has been fixed, the trylock is not necessary anymore. It only creates troubles every time batman-adv is initialised and the RTNL lock is taken by another component. Since the deadlock has been fixed in 3.8, I'd like to queue this patch for stable for 3.8+.
Patch 2/3 fixes an important point in our routing protocol, in particular in the duplicate control messages detection. In the right condition, the protocol prevents some control packets to be forwarded making some nodes in the network totally unreachable and so breaking the mesh. The fix, by Simon Wunderlich, enhances the duplicate message detection and avoids this situation (the change may look a bit big, but it is just because a variable has been changed from int to enum to better describe the duplicate packet event). Please, enqueue this patch for -stable because the bug is there since a while..
Patch 3/3 fixes the Bridge Loop Avoidance component by avoiding a particular routine to run even if the aforementioned feature is off. This bug leads to some useless packets to be sent over the mesh. This one is not critical and therefore does not need to be enqueued for stable.
Please pull or let me know if there is any problem!
Thanks a lot, Antonio
The following changes since commit 40edeff6e1c6f9a6f16536ae3375e3af9d648449:
net_sched: qdisc_get_rtab() must check data[] array (2013-06-07 15:24:04 -0700)
are available in the git repository at:
git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem
for you to fetch changes up to d5b4c93e67b0b1291aa8e2aaf694e40afc3412d0:
batman-adv: Don't handle address updates when bla is disabled (2013-06-10 08:42:18 +0200)
---------------------------------------------------------------- Included change: - fix "rtnl locked" concurrent executions by using rtnl_lock instead of rtnl_trylock. This fix enables batman-adv initialisation to do not fail just because somewhere else in the system another code path is holding the rtnl lock. It is easy to see the problem when batman-adv is trying to start together with other networking components. - fix the routing protocol forwarding policy by enhancing the duplicate control packet detection. When the right circumstances trigger the issue, some nodes in the network become totally unreachable, so breaking the mesh connectivity. - fix the Bridge Loop Avoidance component by not running the originator address change handling routine when the component is disabled. The routine was generating useless packets that were sent over the network.
---------------------------------------------------------------- Matthias Schiffer (1): batman-adv: wait for rtnl in batadv_store_mesh_iface instead of failing if it is taken
Simon Wunderlich (2): batman-adv: forward late OGMs from best next hop batman-adv: Don't handle address updates when bla is disabled
net/batman-adv/bat_iv_ogm.c | 86 ++++++++++++++++++++++------------ net/batman-adv/bridge_loop_avoidance.c | 4 ++ net/batman-adv/sysfs.c | 5 +- 3 files changed, 60 insertions(+), 35 deletions(-)
From: Matthias Schiffer mschiffer@universe-factory.net
The rtnl_lock in batadv_store_mesh_iface has been converted to a rtnl_trylock some time ago to avoid a possible deadlock between rtnl and s_active on removal of the sysfs nodes.
The behaviour introduced by that was quite confusing as it could lead to the sysfs store to fail, making batman-adv setup scripts unreliable. As recently the sysfs removal was postponed to a worker not running with the rtnl taken, the deadlock can't occur any more and it is safe to change the trylock back to a lock to make the sysfs store reliable again.
Signed-off-by: Matthias Schiffer mschiffer@universe-factory.net Reviewed-by: Simon Wunderlich siwu@hrz.tu-chemnitz.de Signed-off-by: Marek Lindner lindner_marek@yahoo.de Signed-off-by: Antonio Quartulli ordex@autistici.org --- net/batman-adv/sysfs.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 15a22ef..929e304 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -582,10 +582,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj, (strncmp(hard_iface->soft_iface->name, buff, IFNAMSIZ) == 0)) goto out;
- if (!rtnl_trylock()) { - ret = -ERESTARTSYS; - goto out; - } + rtnl_lock();
if (status_tmp == BATADV_IF_NOT_IN_USE) { batadv_hardif_disable_interface(hard_iface,
On Tue, 2013-06-11 at 08:34 +0200, Antonio Quartulli wrote:
From: Matthias Schiffer mschiffer@universe-factory.net
The rtnl_lock in batadv_store_mesh_iface has been converted to a rtnl_trylock some time ago to avoid a possible deadlock between rtnl and s_active on removal of the sysfs nodes.
The behaviour introduced by that was quite confusing as it could lead to the sysfs store to fail, making batman-adv setup scripts unreliable.
I think what you actually wanted was ERESTARTNOINTR. But the real problem is that neither of these error codes is valid unless the system call is aborted due to a signal.
Ben.
As recently the sysfs removal was postponed to a worker not running with the rtnl taken, the deadlock can't occur any more and it is safe to change the trylock back to a lock to make the sysfs store reliable again.
Signed-off-by: Matthias Schiffer mschiffer@universe-factory.net Reviewed-by: Simon Wunderlich siwu@hrz.tu-chemnitz.de Signed-off-by: Marek Lindner lindner_marek@yahoo.de Signed-off-by: Antonio Quartulli ordex@autistici.org
net/batman-adv/sysfs.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 15a22ef..929e304 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -582,10 +582,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj, (strncmp(hard_iface->soft_iface->name, buff, IFNAMSIZ) == 0)) goto out;
- if (!rtnl_trylock()) {
ret = -ERESTARTSYS;
goto out;
- }
rtnl_lock();
if (status_tmp == BATADV_IF_NOT_IN_USE) { batadv_hardif_disable_interface(hard_iface,
On Tue, Jun 11, 2013 at 03:09:30PM +0100, Ben Hutchings wrote:
On Tue, 2013-06-11 at 08:34 +0200, Antonio Quartulli wrote:
From: Matthias Schiffer mschiffer@universe-factory.net
The rtnl_lock in batadv_store_mesh_iface has been converted to a rtnl_trylock some time ago to avoid a possible deadlock between rtnl and s_active on removal of the sysfs nodes.
The behaviour introduced by that was quite confusing as it could lead to the sysfs store to fail, making batman-adv setup scripts unreliable.
I think what you actually wanted was ERESTARTNOINTR. But the real problem is that neither of these error codes is valid unless the system call is aborted due to a signal.
Well, it should have been handled differently from the beginning..the other problem was that ERESTARTSYS was propagated to userspace while this is not supposed to happen.
However, with this patch the initialisation does not fail anymore. So we don't need to care about the error code anymore :-)
Cheers,
From: Simon Wunderlich simon@open-mesh.com
When a packet is received from another node first and later from the best next hop, this packet is dropped. However the first OGM was sent with the BATADV_NOT_BEST_NEXT_HOP flag and thus dropped by neighbors. The late OGM from the best neighbor is then dropped because it is a duplicate.
If this situation happens constantly, a node might end up not forwarding the "valid" OGMs anymore, and nodes behind will starve from not getting valid OGMs.
Fix this by refining the duplicate checking behaviour: The actions should depend on whether it was a duplicate for a neighbor only or for the originator. OGMs which are not duplicates for a specific neighbor will now be considered in batadv_iv_ogm_forward(), but only actually forwarded for the best next hop. Therefore, late OGMs from the best next hop are forwarded now and not dropped as duplicates anymore.
Signed-off-by: Simon Wunderlich simon@open-mesh.com Signed-off-by: Marek Lindner lindner_marek@yahoo.de Signed-off-by: Antonio Quartulli ordex@autistici.org --- net/batman-adv/bat_iv_ogm.c | 86 +++++++++++++++++++++++++++++---------------- 1 file changed, 55 insertions(+), 31 deletions(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 071f288..f680ee1 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -29,6 +29,21 @@ #include "bat_algo.h" #include "network-coding.h"
+/** + * batadv_dup_status - duplicate status + * @BATADV_NO_DUP: the packet is a duplicate + * @BATADV_ORIG_DUP: OGM is a duplicate in the originator (but not for the + * neighbor) + * @BATADV_NEIGH_DUP: OGM is a duplicate for the neighbor + * @BATADV_PROTECTED: originator is currently protected (after reboot) + */ +enum batadv_dup_status { + BATADV_NO_DUP = 0, + BATADV_ORIG_DUP, + BATADV_NEIGH_DUP, + BATADV_PROTECTED, +}; + static struct batadv_neigh_node * batadv_iv_ogm_neigh_new(struct batadv_hard_iface *hard_iface, const uint8_t *neigh_addr, @@ -650,7 +665,7 @@ batadv_iv_ogm_orig_update(struct batadv_priv *bat_priv, const struct batadv_ogm_packet *batadv_ogm_packet, struct batadv_hard_iface *if_incoming, const unsigned char *tt_buff, - int is_duplicate) + enum batadv_dup_status dup_status) { struct batadv_neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL; struct batadv_neigh_node *router = NULL; @@ -676,7 +691,7 @@ batadv_iv_ogm_orig_update(struct batadv_priv *bat_priv, continue; }
- if (is_duplicate) + if (dup_status != BATADV_NO_DUP) continue;
spin_lock_bh(&tmp_neigh_node->lq_update_lock); @@ -718,7 +733,7 @@ batadv_iv_ogm_orig_update(struct batadv_priv *bat_priv, neigh_node->tq_avg = batadv_ring_buffer_avg(neigh_node->tq_recv); spin_unlock_bh(&neigh_node->lq_update_lock);
- if (!is_duplicate) { + if (dup_status == BATADV_NO_DUP) { orig_node->last_ttl = batadv_ogm_packet->header.ttl; neigh_node->last_ttl = batadv_ogm_packet->header.ttl; } @@ -902,15 +917,16 @@ out: return ret; }
-/* processes a batman packet for all interfaces, adjusts the sequence number and - * finds out whether it is a duplicate. - * returns: - * 1 the packet is a duplicate - * 0 the packet has not yet been received - * -1 the packet is old and has been received while the seqno window - * was protected. Caller should drop it. +/** + * batadv_iv_ogm_update_seqnos - process a batman packet for all interfaces, + * adjust the sequence number and find out whether it is a duplicate + * @ethhdr: ethernet header of the packet + * @batadv_ogm_packet: OGM packet to be considered + * @if_incoming: interface on which the OGM packet was received + * + * Returns duplicate status as enum batadv_dup_status */ -static int +static enum batadv_dup_status batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, const struct batadv_ogm_packet *batadv_ogm_packet, const struct batadv_hard_iface *if_incoming) @@ -918,17 +934,18 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, struct batadv_priv *bat_priv = netdev_priv(if_incoming->soft_iface); struct batadv_orig_node *orig_node; struct batadv_neigh_node *tmp_neigh_node; - int is_duplicate = 0; + int is_dup; int32_t seq_diff; int need_update = 0; - int set_mark, ret = -1; + int set_mark; + enum batadv_dup_status ret = BATADV_NO_DUP; uint32_t seqno = ntohl(batadv_ogm_packet->seqno); uint8_t *neigh_addr; uint8_t packet_count;
orig_node = batadv_get_orig_node(bat_priv, batadv_ogm_packet->orig); if (!orig_node) - return 0; + return BATADV_NO_DUP;
spin_lock_bh(&orig_node->ogm_cnt_lock); seq_diff = seqno - orig_node->last_real_seqno; @@ -936,22 +953,29 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, /* signalize caller that the packet is to be dropped. */ if (!hlist_empty(&orig_node->neigh_list) && batadv_window_protected(bat_priv, seq_diff, - &orig_node->batman_seqno_reset)) + &orig_node->batman_seqno_reset)) { + ret = BATADV_PROTECTED; goto out; + }
rcu_read_lock(); hlist_for_each_entry_rcu(tmp_neigh_node, &orig_node->neigh_list, list) { - is_duplicate |= batadv_test_bit(tmp_neigh_node->real_bits, - orig_node->last_real_seqno, - seqno); - neigh_addr = tmp_neigh_node->addr; + is_dup = batadv_test_bit(tmp_neigh_node->real_bits, + orig_node->last_real_seqno, + seqno); + if (batadv_compare_eth(neigh_addr, ethhdr->h_source) && - tmp_neigh_node->if_incoming == if_incoming) + tmp_neigh_node->if_incoming == if_incoming) { set_mark = 1; - else + if (is_dup) + ret = BATADV_NEIGH_DUP; + } else { set_mark = 0; + if (is_dup && (ret != BATADV_NEIGH_DUP)) + ret = BATADV_ORIG_DUP; + }
/* if the window moved, set the update flag. */ need_update |= batadv_bit_get_packet(bat_priv, @@ -971,8 +995,6 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, orig_node->last_real_seqno = seqno; }
- ret = is_duplicate; - out: spin_unlock_bh(&orig_node->ogm_cnt_lock); batadv_orig_node_free_ref(orig_node); @@ -994,7 +1016,8 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, int is_broadcast = 0, is_bidirect; bool is_single_hop_neigh = false; bool is_from_best_next_hop = false; - int is_duplicate, sameseq, simlar_ttl; + int sameseq, similar_ttl; + enum batadv_dup_status dup_status; uint32_t if_incoming_seqno; uint8_t *prev_sender;
@@ -1138,10 +1161,10 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, if (!orig_node) return;
- is_duplicate = batadv_iv_ogm_update_seqnos(ethhdr, batadv_ogm_packet, - if_incoming); + dup_status = batadv_iv_ogm_update_seqnos(ethhdr, batadv_ogm_packet, + if_incoming);
- if (is_duplicate == -1) { + if (dup_status == BATADV_PROTECTED) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Drop packet: packet within seqno protection time (sender: %pM)\n", ethhdr->h_source); @@ -1211,11 +1234,12 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, * seqno and similar ttl as the non-duplicate */ sameseq = orig_node->last_real_seqno == ntohl(batadv_ogm_packet->seqno); - simlar_ttl = orig_node->last_ttl - 3 <= batadv_ogm_packet->header.ttl; - if (is_bidirect && (!is_duplicate || (sameseq && simlar_ttl))) + similar_ttl = orig_node->last_ttl - 3 <= batadv_ogm_packet->header.ttl; + if (is_bidirect && ((dup_status == BATADV_NO_DUP) || + (sameseq && similar_ttl))) batadv_iv_ogm_orig_update(bat_priv, orig_node, ethhdr, batadv_ogm_packet, if_incoming, - tt_buff, is_duplicate); + tt_buff, dup_status);
/* is single hop (direct) neighbor */ if (is_single_hop_neigh) { @@ -1236,7 +1260,7 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, goto out_neigh; }
- if (is_duplicate) { + if (dup_status == BATADV_NEIGH_DUP) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Drop packet: duplicate packet received\n"); goto out_neigh;
From: Simon Wunderlich simon@open-mesh.com
The bridge loop avoidance has a hook to handle address updates of the originator. These should not be handled when bridge loop avoidance is disabled - it might send some bridge loop avoidance packets which should not appear if bla is disabled.
Signed-off-by: Simon Wunderlich simon@open-mesh.com Signed-off-by: Marek Lindner lindner_marek@yahoo.de Signed-off-by: Antonio Quartulli ordex@autistici.org --- net/batman-adv/bridge_loop_avoidance.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c index 379061c..de27b31 100644 --- a/net/batman-adv/bridge_loop_avoidance.c +++ b/net/batman-adv/bridge_loop_avoidance.c @@ -1067,6 +1067,10 @@ void batadv_bla_update_orig_address(struct batadv_priv *bat_priv, group = htons(crc16(0, primary_if->net_dev->dev_addr, ETH_ALEN)); bat_priv->bla.claim_dest.group = group;
+ /* purge everything when bridge loop avoidance is turned off */ + if (!atomic_read(&bat_priv->bridge_loop_avoidance)) + oldif = NULL; + if (!oldif) { batadv_bla_purge_claims(bat_priv, NULL, 1); batadv_bla_purge_backbone_gw(bat_priv, 1);
Hello David,
On Tue, Jun 11, 2013 at 08:34:58AM +0200, Antonio Quartulli wrote:
[...]
Please pull or let me know if there is any problem!
After having applied this patchset the merge of net into net-next will generate a conflict. Here you have the needed information to properly solve it.
Regards,
conflict in net/batman-adv/bat_iv_ogm.c:
++<<<<<<< HEAD + * batadv_ring_buffer_set - update the ring buffer with the given value + * @lq_recv: pointer to the ring buffer + * @lq_index: index to store the value at + * @value: value to store in the ring buffer + */ +static void batadv_ring_buffer_set(uint8_t lq_recv[], uint8_t *lq_index, + uint8_t value) +{ + lq_recv[*lq_index] = value; + *lq_index = (*lq_index + 1) % BATADV_TQ_GLOBAL_WINDOW_SIZE; +} + +/** + * batadv_ring_buffer_set - compute the average of all non-zero values stored + * in the given ring buffer + * @lq_recv: pointer to the ring buffer + * + * Returns computed average value. + */ +static uint8_t batadv_ring_buffer_avg(const uint8_t lq_recv[]) +{ + const uint8_t *ptr; + uint16_t count = 0, i = 0, sum = 0; + + ptr = lq_recv; + + while (i < BATADV_TQ_GLOBAL_WINDOW_SIZE) { + if (*ptr != 0) { + count++; + sum += *ptr; + } + + i++; + ptr++; + } + + if (count == 0) + return 0; + + return (uint8_t)(sum / count); +} ++======= + * batadv_dup_status - duplicate status + * @BATADV_NO_DUP: the packet is a duplicate + * @BATADV_ORIG_DUP: OGM is a duplicate in the originator (but not for the + * neighbor) + * @BATADV_NEIGH_DUP: OGM is a duplicate for the neighbor + * @BATADV_PROTECTED: originator is currently protected (after reboot) + */ + enum batadv_dup_status { + BATADV_NO_DUP = 0, + BATADV_ORIG_DUP, + BATADV_NEIGH_DUP, + BATADV_PROTECTED, + }; + ++>>>>>>> 7c24bbb... batman-adv: forward late OGMs from best next hop
is solved as follows (add /** at the top of the batadv_dup_status kerneldoc and keep everything):
<<<<<<<<<<<< * batadv_ring_buffer_set - update the ring buffer with the given value * @lq_recv: pointer to the ring buffer * @lq_index: index to store the value at * @value: value to store in the ring buffer */ static void batadv_ring_buffer_set(uint8_t lq_recv[], uint8_t *lq_index, uint8_t value) { lq_recv[*lq_index] = value; *lq_index = (*lq_index + 1) % BATADV_TQ_GLOBAL_WINDOW_SIZE; }
/** * batadv_ring_buffer_set - compute the average of all non-zero values stored * in the given ring buffer * @lq_recv: pointer to the ring buffer * * Returns computed average value. */ static uint8_t batadv_ring_buffer_avg(const uint8_t lq_recv[]) { const uint8_t *ptr; uint16_t count = 0, i = 0, sum = 0;
ptr = lq_recv;
while (i < BATADV_TQ_GLOBAL_WINDOW_SIZE) { if (*ptr != 0) { count++; sum += *ptr; }
i++; ptr++; }
if (count == 0) return 0;
return (uint8_t)(sum / count); }
/** * batadv_dup_status - duplicate status * @BATADV_NO_DUP: the packet is a duplicate * @BATADV_ORIG_DUP: OGM is a duplicate in the originator (but not for the * neighbor) * @BATADV_NEIGH_DUP: OGM is a duplicate for the neighbor * @BATADV_PROTECTED: originator is currently protected (after reboot) */ enum batadv_dup_status { BATADV_NO_DUP = 0, BATADV_ORIG_DUP, BATADV_NEIGH_DUP, BATADV_PROTECTED, };
>>>>>>>>
From: Antonio Quartulli ordex@autistici.org Date: Tue, 11 Jun 2013 08:34:58 +0200
here you have three patches intended for net/linux-3.10.
Pulled, thanks Antonio.
b.a.t.m.a.n@lists.open-mesh.org