This is the third iteration of the patches, with the following changes (thanks Antonio for review): * add Antonios ack to first patch * extend commit message for second patch
Original (slightly updated) description here:
Upon debugging a network with dangling, bogus TT entries, I found a couple of bugs for which I would like to propose fixes. The network showed the following symptoms:
* DAT was enabled * VLANs were used, but not every originator used the same VLANs * I've found global entries assigned to originators which actually never had the client in question connected to them. These false target originators didn't even had the VLANs in use or bridged for which the entry was done. This caused packets sent to be sent to these wrong originators. * The wrong entries also didn't get purged automatically, since they didn't announce the VLAN in question through their TT TLVLs, and the TT code didn't check for excess VLANs. * Furthermore, the temp flag was removed too early from the TT entries so that the purge function was not removing the entry after a timeout as well. * I've found that with DAT, the cached ARP replies may cause these TT entries to be created on behalf of the answering host. This is wrong, since the answering host (usually) doesn't actually has the client connected.
We have not yet tested these patches in the network, and the bug also appears only once in a couple of days. Therefore I'd like to ask to review these patches thoroughly, and if you agree to the fixes we will apply them in this production network.
Thanks, Simon
Simon Wunderlich (3): batman-adv: fix speedy join for DAT cache replies batman-adv: avoid keeping false temporary entry batman-adv: detect local excess vlans in TT request
net/batman-adv/routing.c | 19 +++++++++++++++---- net/batman-adv/translation-table.c | 24 +++++++++++++++++++++--- 2 files changed, 36 insertions(+), 7 deletions(-)
DAT Cache replies are answered on behalf of other clients which are not connected to the answering originator. Therefore, we shouldn't add these clients to the answering originators TT table through speed join to avoid bogus entries.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Acked-by: Antonio Quartulli antonio@meshcoding.com
--- Changes to PATCH: * only allow speedy join for BATADV_P_DATA to avoid adding more corner cases in the future --- net/batman-adv/routing.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index c360c0c..96b5daa 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -836,6 +836,7 @@ int batadv_recv_unicast_packet(struct sk_buff *skb, uint8_t *orig_addr; struct batadv_orig_node *orig_node = NULL; int check, hdr_size = sizeof(*unicast_packet); + enum batadv_subtype subtype; bool is4addr;
unicast_packet = (struct batadv_unicast_packet *)skb->data; @@ -863,10 +864,20 @@ int batadv_recv_unicast_packet(struct sk_buff *skb, /* packet for me */ if (batadv_is_my_mac(bat_priv, unicast_packet->dest)) { if (is4addr) { - batadv_dat_inc_counter(bat_priv, - unicast_4addr_packet->subtype); - orig_addr = unicast_4addr_packet->src; - orig_node = batadv_orig_hash_find(bat_priv, orig_addr); + subtype = unicast_4addr_packet->subtype; + batadv_dat_inc_counter(bat_priv, subtype); + + /* Only payload data should be considered for speedy + * join. For example, DAT also uses unicast 4addr + * types, but those packets should not be considered + * for speedy join, since the clients do not actually + * reside at the sending originator. + */ + if (subtype == BATADV_P_DATA) { + orig_addr = unicast_4addr_packet->src; + orig_node = batadv_orig_hash_find(bat_priv, + orig_addr); + } }
if (batadv_dat_snoop_incoming_arp_request(bat_priv, skb,
On Wednesday, September 02, 2015 20:09:54 Simon Wunderlich wrote:
DAT Cache replies are answered on behalf of other clients which are not connected to the answering originator. Therefore, we shouldn't add these clients to the answering originators TT table through speed join to avoid bogus entries.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Acked-by: Antonio Quartulli antonio@meshcoding.com
Changes to PATCH:
- only allow speedy join for BATADV_P_DATA to avoid adding more corner cases in the future
net/batman-adv/routing.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
Applied in revision 2decb5f.
Thanks, Marek
In the case when a temporary entry is added first and a proper tt entry is added after that, the temporary tt entry is kept in the orig list. However the temporary flag is removed at this point, and therefore the purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper one.
This case can happen if a client behind a given originator moves before the TT announcement is sent out. Other than that, this case can also be created by bogus or malicious payload frames for VLANs which are not existent on the sending originator.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de --- Changes to PATCHv2:
* extend commit message to explain scenario --- net/batman-adv/translation-table.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 7986ec5..f629c21 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -1416,9 +1416,15 @@ static bool batadv_tt_global_add(struct batadv_priv *bat_priv, }
/* if the client was temporary added before receiving the first - * OGM announcing it, we have to clear the TEMP flag + * OGM announcing it, we have to clear the TEMP flag. Also, + * remove the previous temporary orig node and re-add it + * if required. If the orig entry changed, the new one which + * is a non-temporary entry is preferred. */ - common->flags &= ~BATADV_TT_CLIENT_TEMP; + if (common->flags & BATADV_TT_CLIENT_TEMP) { + batadv_tt_global_del_orig_list(tt_global_entry); + common->flags &= ~BATADV_TT_CLIENT_TEMP; + }
/* the change can carry possible "attribute" flags like the * TT_CLIENT_WIFI, therefore they have to be copied in the
On 02/09/15 20:09, Simon Wunderlich wrote:
In the case when a temporary entry is added first and a proper tt entry is added after that, the temporary tt entry is kept in the orig list. However the temporary flag is removed at this point, and therefore the purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper one.
This case can happen if a client behind a given originator moves before the TT announcement is sent out. Other than that, this case can also be created by bogus or malicious payload frames for VLANs which are not existent on the sending originator.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
Acked-by: Antonio Quartulli antonio@meshcoding.com
On Wednesday, September 02, 2015 20:11:01 Antonio Quartulli wrote:
On 02/09/15 20:09, Simon Wunderlich wrote:
In the case when a temporary entry is added first and a proper tt entry is added after that, the temporary tt entry is kept in the orig list. However the temporary flag is removed at this point, and therefore the purge function can not find this temporary entry anymore.
Therefore, remove the previous temp entry before adding the new proper one.
This case can happen if a client behind a given originator moves before the TT announcement is sent out. Other than that, this case can also be created by bogus or malicious payload frames for VLANs which are not existent on the sending originator.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
Acked-by: Antonio Quartulli antonio@meshcoding.com
Applied in revision 4a73d74.
Thanks, Marek
If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de --- Changes to PATCH: * count instead of explicitly comparing VLANs, which makes it much simpler (thank Antonio) --- net/batman-adv/translation-table.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index f629c21..e7f2998 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2391,8 +2391,8 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, { struct batadv_tvlv_tt_vlan_data *tt_vlan_tmp; struct batadv_orig_node_vlan *vlan; + int i, orig_num_vlan; uint32_t crc; - int i;
/* check if each received CRC matches the locally stored one */ for (i = 0; i < num_vlan; i++) { @@ -2418,6 +2418,18 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, return false; }
+ /* check if any excess VLANs exist locally for the originator + * which are not mentioned in the TVLV from the originator. + */ + rcu_read_lock(); + orig_num_vlan = 0; + list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) + orig_num_vlan++; + rcu_read_unlock(); + + if (orig_num_vlan > num_vlan) + return false; + return true; }
On 02/09/15 20:09, Simon Wunderlich wrote:
If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
Acked-by: Antonio Quartulli antonio@meshcoding.com
On Wednesday, September 02, 2015 20:12:16 Antonio Quartulli wrote:
On 02/09/15 20:09, Simon Wunderlich wrote:
If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process.
Reported-by: Alessandro Bolletta alessandro@mediaspot.net Signed-off-by: Simon Wunderlich sw@simonwunderlich.de
Acked-by: Antonio Quartulli antonio@meshcoding.com
Applied in revision 2dd1d9f.
Thanks, Marek
b.a.t.m.a.n@lists.open-mesh.org