Upon debugging a network with dangling, bogus TT entries, I found a couple of bugs for which I would like to propose fixes. The network showed the following symptoms:
* DAT was enabled * VLANs were used, but not every originator used the same VLANs * I've found global entries assigned to originators which actually never had the client in question connected to them. These false target originators didn't even had the VLANs in use or bridged for which the entry was done. This caused packets sent to be sent to these wrong originators. * The wrong entries also didn't get purged automatically, since they didn't announce the VLAN in question through their TT TLVLs, and the TT code didn't check for excess VLANs. * Furthermore, the temp flag was removed too early from the TT entries so that the purge function was not removing the entry after a timeout as well. * I've found that with DAT, the cached ARP replies may cause these TT entries to be created on behalf of the answering host. This is wrong, since the answering host (usually) doesn't actually has the client connected.
Three of these four patches fix various issues connected with the problem described above. The fourth one is merely a style fix which does not neccessarily have to be adopted to maint.
We have not yet tested these patches in the network, and the bug also appears only once in a couple of days. Therefore I'd like to ask to review these patches thoroughly, and if you agree to the fixes we will apply them in this production network.
Thanks, Simon
Simon Wunderlich (4): batman-adv: fix speedy join for DAT cache replies batman-adv: avoid keeping false temporary entry batman-adv: unify flags access style in tt global add batman-adv: detect local excess vlans in TT request
net/batman-adv/routing.c | 17 +++++++++++++---- net/batman-adv/translation-table.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 43 insertions(+), 7 deletions(-)
DAT Cache replies are answered on behalf of other clients which are not connected to the answering originator. Therefore, we shouldn't add these clients to the answering originators TT table through speed join to avoid bogus entries.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de --- net/batman-adv/routing.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index c360c0c..f2779b6 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -836,6 +836,7 @@ int batadv_recv_unicast_packet(struct sk_buff *skb, uint8_t *orig_addr; struct batadv_orig_node *orig_node = NULL; int check, hdr_size = sizeof(*unicast_packet); + enum batadv_subtype subtype; bool is4addr;
unicast_packet = (struct batadv_unicast_packet *)skb->data; @@ -863,10 +864,18 @@ int batadv_recv_unicast_packet(struct sk_buff *skb, /* packet for me */ if (batadv_is_my_mac(bat_priv, unicast_packet->dest)) { if (is4addr) { - batadv_dat_inc_counter(bat_priv, - unicast_4addr_packet->subtype); - orig_addr = unicast_4addr_packet->src; - orig_node = batadv_orig_hash_find(bat_priv, orig_addr); + subtype = unicast_4addr_packet->subtype; + batadv_dat_inc_counter(bat_priv, subtype); + + /* Cache replies should not be considered for speedy + * join, since the clients do not actually reside at + * the originator. + */ + if (subtype != BATADV_P_DAT_CACHE_REPLY) { + orig_addr = unicast_4addr_packet->src; + orig_node = batadv_orig_hash_find(bat_priv, + orig_addr); + } }
if (batadv_dat_snoop_incoming_arp_request(bat_priv, skb,
On 21/08/15 17:15, Simon Wunderlich wrote:
/* Cache replies should not be considered for speedy
* join, since the clients do not actually reside at
* the originator.
*/
if (subtype != BATADV_P_DAT_CACHE_REPLY) {
Tha patch looks good but this should really be:
if (subtype == BATADV_P_DATA) {
because speedy join is supposed to work with any payload packet.
Cheers,
orig_addr = unicast_4addr_packet->src;
orig_node = batadv_orig_hash_find(bat_priv,
orig_addr);
}
}
if (batadv_dat_snoop_incoming_arp_request(bat_priv, skb,
Hi Antonio,
On Tuesday 25 August 2015 11:42:05 Antonio Quartulli wrote:
On 21/08/15 17:15, Simon Wunderlich wrote:
/* Cache replies should not be considered for speedy
* join, since the clients do not actually reside at
* the originator.
*/
if (subtype != BATADV_P_DAT_CACHE_REPLY) {
Tha patch looks good but this should really be:
if (subtype == BATADV_P_DATA) {
because speedy join is supposed to work with any payload packet.
OK, will make that change in PATCH v2.
Thanks, Simon
In the case when a temporary entry is added first and a proper tt entry is added after that, the temporary tt entry is kept in the orig list. However the temporary flag is removed at this point, and therefore the purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper one.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de --- net/batman-adv/translation-table.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 7986ec5..f629c21 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -1416,9 +1416,15 @@ static bool batadv_tt_global_add(struct batadv_priv *bat_priv, }
/* if the client was temporary added before receiving the first - * OGM announcing it, we have to clear the TEMP flag + * OGM announcing it, we have to clear the TEMP flag. Also, + * remove the previous temporary orig node and re-add it + * if required. If the orig entry changed, the new one which + * is a non-temporary entry is preferred. */ - common->flags &= ~BATADV_TT_CLIENT_TEMP; + if (common->flags & BATADV_TT_CLIENT_TEMP) { + batadv_tt_global_del_orig_list(tt_global_entry); + common->flags &= ~BATADV_TT_CLIENT_TEMP; + }
/* the change can carry possible "attribute" flags like the * TT_CLIENT_WIFI, therefore they have to be copied in the
On 21/08/15 17:15, Simon Wunderlich wrote:
In the case when a temporary entry is added first and a proper tt entry is added after that, the temporary tt entry is kept in the orig list. However the temporary flag is removed at this point, and therefore the purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper one.
[..]
/* if the client was temporary added before receiving the first
* OGM announcing it, we have to clear the TEMP flag
* OGM announcing it, we have to clear the TEMP flag. Also,
* remove the previous temporary orig node and re-add it
* if required. If the orig entry changed, the new one which
*/* is a non-temporary entry is preferred.
common->flags &= ~BATADV_TT_CLIENT_TEMP;
if (common->flags & BATADV_TT_CLIENT_TEMP) {
batadv_tt_global_del_orig_list(tt_global_entry);
common->flags &= ~BATADV_TT_CLIENT_TEMP;
}
mh...interesting..and nice catch. Have you tested this with a client roaming from A to B during the "speedy join" period?
Cheers,
Hi Antonio,
On Tuesday 25 August 2015 11:49:17 Antonio Quartulli wrote:
On 21/08/15 17:15, Simon Wunderlich wrote:
In the case when a temporary entry is added first and a proper tt entry is added after that, the temporary tt entry is kept in the orig list. However the temporary flag is removed at this point, and therefore the purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper one.
[..]
/* if the client was temporary added before receiving the first
* OGM announcing it, we have to clear the TEMP flag
* OGM announcing it, we have to clear the TEMP flag. Also,
* remove the previous temporary orig node and re-add it
* if required. If the orig entry changed, the new one which
* is a non-temporary entry is preferred.
*/
common->flags &= ~BATADV_TT_CLIENT_TEMP;
if (common->flags & BATADV_TT_CLIENT_TEMP) {
batadv_tt_global_del_orig_list(tt_global_entry);
common->flags &= ~BATADV_TT_CLIENT_TEMP;
}
mh...interesting..and nice catch. Have you tested this with a client roaming from A to B during the "speedy join" period?
No I didn't. I've just seen bad entries dangling and never getting cleaned up, and suspected that "speedy join" added TT the entry falsely. However, in that case it should only have been temporarily and getting removed later, which wasn't the case. Therefore this patch ...
I haven't tested this patch on the production network because its very hard to reproduce there, and the bug is probably squashed already with the first patch.
Thanks for reviewing! Simon
This should slightly improve readability
Signed-off-by: Simon Wunderlich sw@simonwunderlich.de --- net/batman-adv/translation-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index f629c21..dced2da 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -1430,7 +1430,7 @@ static bool batadv_tt_global_add(struct batadv_priv *bat_priv, * TT_CLIENT_WIFI, therefore they have to be copied in the * client entry */ - tt_global_entry->common.flags |= flags; + common->flags |= flags;
/* If there is the BATADV_TT_CLIENT_ROAM flag set, there is only * one originator left in the list and we previously received a
On 21/08/15 17:15, Simon Wunderlich wrote:
This should slightly improve readability
Patch looks good, but it is definitely *not* for maint.
Cheers,
On Tuesday 25 August 2015 11:51:24 Antonio Quartulli wrote:
On 21/08/15 17:15, Simon Wunderlich wrote:
This should slightly improve readability
Patch looks good, but it is definitely *not* for maint.
Cheers,
Shall I keep it in the patchset, and our friendly maintainer can merge it on master instead? Or do you prefer to send it separately?
Thanks, Simo
On 25/08/15 17:28, Simon Wunderlich wrote:
On Tuesday 25 August 2015 11:51:24 Antonio Quartulli wrote:
On 21/08/15 17:15, Simon Wunderlich wrote:
This should slightly improve readability
Patch looks good, but it is definitely *not* for maint.
Cheers,
Shall I keep it in the patchset, and our friendly maintainer can merge it on master instead? Or do you prefer to send it separately?
Up to the friendly maintainer. but if you ask me it is better to send this patch separately.
Cheers,
If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de --- net/batman-adv/translation-table.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index dced2da..1adb72e 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2392,6 +2392,7 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, struct batadv_tvlv_tt_vlan_data *tt_vlan_tmp; struct batadv_orig_node_vlan *vlan; uint32_t crc; + bool found; int i;
/* check if each received CRC matches the locally stored one */ @@ -2418,6 +2419,26 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, return false; }
+ /* check if any excess VLANs exist locally for the originator + * which are not mentioned in the TVLV from the originator. + */ + rcu_read_lock(); + list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) { + found = false; + + for (i = 0; i < num_vlan; i++) { + tt_vlan_tmp = tt_vlan + i; + if (ntohs(tt_vlan_tmp->vid) == vlan->vid) { + found = true; + break; + } + } + + if (!found) + return false; + } + rcu_read_unlock(); + return true; }
On 21/08/15 17:15, Simon Wunderlich wrote:
If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
net/batman-adv/translation-table.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index dced2da..1adb72e 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2392,6 +2392,7 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, struct batadv_tvlv_tt_vlan_data *tt_vlan_tmp; struct batadv_orig_node_vlan *vlan; uint32_t crc;
bool found; int i;
/* check if each received CRC matches the locally stored one */
@@ -2418,6 +2419,26 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, return false; }
- /* check if any excess VLANs exist locally for the originator
* which are not mentioned in the TVLV from the originator.
*/
- rcu_read_lock();
- list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) {
found = false;
for (i = 0; i < num_vlan; i++) {
tt_vlan_tmp = tt_vlan + i;
if (ntohs(tt_vlan_tmp->vid) == vlan->vid) {
found = true;
break;
}
}
if (!found)
return false;
- }
- rcu_read_unlock();
NAK.
we already do this check slightly above in this function with the following code:
2426 vlan = batadv_orig_node_vlan_get(orig_node, 2427 ntohs(tt_vlan_tmp->vid)); 2428 if (!vlan) 2429 return false;
batadv_orig_node_vlan_get() returns NULL if we don't know this VLAN for that Originator, therefore the CRC check fails here.
Cheers,
return true; }
On Tuesday 25 August 2015 11:59:01 Antonio Quartulli wrote:
On 21/08/15 17:15, Simon Wunderlich wrote:
If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
net/batman-adv/translation-table.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index dced2da..1adb72e 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2392,6 +2392,7 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node,> struct batadv_tvlv_tt_vlan_data *tt_vlan_tmp; struct batadv_orig_node_vlan *vlan; uint32_t crc;
bool found;
int i;
/* check if each received CRC matches the locally stored one */
@@ -2418,6 +2419,26 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node,> return false;
}
- /* check if any excess VLANs exist locally for the originator
* which are not mentioned in the TVLV from the originator.
*/
- rcu_read_lock();
- list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) {
found = false;
for (i = 0; i < num_vlan; i++) {
tt_vlan_tmp = tt_vlan + i;
if (ntohs(tt_vlan_tmp->vid) == vlan->vid) {
found = true;
break;
}
}
if (!found)
return false;
- }
- rcu_read_unlock();
NAK.
we already do this check slightly above in this function with the following code:
2426 vlan = batadv_orig_node_vlan_get(orig_node, 2427 ntohs(tt_vlan_tmp->vid)); 2428 if (!vlan) 2429 return false;
batadv_orig_node_vlan_get() returns NULL if we don't know this VLAN for that Originator, therefore the CRC check fails here.
That's right, however it only sweeps through the VLANs announced within the TT-TVLV. However, my addition tries to check if there are any excess VLAN locally which are NOT in that TT-TVLV. I think this patch doesn't take care of that, or am I missing something?
For example, think of having VLAN 6 locally with a couple of global entries at the originator, but the TT-TVLV only announces VLANs 3,4,5. Then the fact that we also have VLAN 6 is not detected, and these (probably wrong) entries are never cleaned up.
Thanks! Simon
On 25/08/15 17:31, Simon Wunderlich wrote:
batadv_orig_node_vlan_get() returns NULL if we don't know this VLAN for that Originator, therefore the CRC check fails here.
That's right, however it only sweeps through the VLANs announced within the TT-TVLV. However, my addition tries to check if there are any excess VLAN locally which are NOT in that TT-TVLV. I think this patch doesn't take care of that, or am I missing something?
For example, think of having VLAN 6 locally with a couple of global entries at the originator, but the TT-TVLV only announces VLANs 3,4,5. Then the fact that we also have VLAN 6 is not detected, and these (probably wrong) entries are never cleaned up.
Right, this check is required, but what about just checking the VLAN count in the tt packet and in the originator struct ? if the number is different it means that there must be an excess in the local struct.
Cheers,
On Tuesday 25 August 2015 20:28:48 Antonio Quartulli wrote:
On 25/08/15 17:31, Simon Wunderlich wrote:
batadv_orig_node_vlan_get() returns NULL if we don't know this VLAN for that Originator, therefore the CRC check fails here.
That's right, however it only sweeps through the VLANs announced within the TT-TVLV. However, my addition tries to check if there are any excess VLAN locally which are NOT in that TT-TVLV. I think this patch doesn't take care of that, or am I missing something?
For example, think of having VLAN 6 locally with a couple of global entries at the originator, but the TT-TVLV only announces VLANs 3,4,5. Then the fact that we also have VLAN 6 is not detected, and these (probably wrong) entries are never cleaned up.
Right, this check is required, but what about just checking the VLAN count in the tt packet and in the originator struct ? if the number is different it means that there must be an excess in the local struct.
You are right, just counting would be way simpler than comparing for the right VLANs (since we don't use that information anyway). And this patch also did not unlock the rcu-lock properly ... I'll resend this one with the simpler method.
Thanks! Simon
b.a.t.m.a.n@lists.open-mesh.org