Hi Francesco,
On Tuesday, September 11, 2018 4:38:13 PM CEST Francesco Salvatore [fabbricadigitale] wrote:
Hi all, We're running a mesh network made of a cloud of clients and multiple gateways on two separate VLANs (on eth0, not on top of BATMAN). The setup is similar to the one described in the figure. https://www.open-mesh.org/attachments/download/132/Test_2xLAN.dia.png
We noticed that, sometimes, when new gateways are added to the already running infrastructure network loops appear on VLANs We dumped VLANs network traffic during one of these loops and we saw a storm of BLA frames that collapsed the network. It seems that the frame (an ANNOUNCE one, in this case) was firstly generated by a gateway and started to loop inside the LAN, and then even the others gateways propagated the same frame. After a few seconds also other frames (coming from different gateways) started to loop.
Our hypothesis is that one of gateways directly injects BLA frames inside mesh and that lead to an unmanageable loop. So, we have 2 questions:
- Are BLA frames (except for LOOP DETECT) allowed to flow only on
LAN?
Yes, all frames except LOOP DETECT are blocked in BATMAN
- If so, is our hypothesis reasonable?
You can see the situation described above in the screenshot below. http://oi63.tinypic.com/v7wl1w.jpg
Unfortunately the screenshot doesn't describe which packets looped exactly.
Are you sure it's an announce frame? It could also be a claim frame where two hosts try to claim hosts from each other.
BATMAN has a grace period to allow broadcasts from the LAN only after 1 minute of operation. This is done to make sure that the mesh is properly established and other gateways and their claims are detected before traffic is allowed on it, at least potentially looping traffic. Therefore, you should make sure (e.g. in your firmware or setup scripts) that the LAN is operational once batman is brought op.
If the mesh isn't fully established or it's actually split due to different channels or similar, then you may run in an unresolved limitation of BLA:
https://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance-II#...
For this reason we have the loop detect packets. If a loop is detected, an uevent is sent to userspace, and the firmware should react appropiately, e.g. by shutting down batman-adv.
Cheers, Simon