Hey Gabriel,
thanks for bringing the discussion to the batman ml and giving some constructive input. I've written this bonding/alternating feature some time ago, and we released it at WBMv3 together with this little documentation to be found in the wiki. Actually, I considered the feature rather simple and therefore I did not write too much about it - because there is not really much to write about, or so I thought. Obviously, there were some things unclear, so thanks for pointing me/us to that.
When implementing, it is easy to miss some things that are not that obvious for outsiders, so please feel free to ask or suggest things. We'll rework the bonding/interface alternating part in the next days, and would be happy to include your suggestions. :)
Usually, we create the protocol documentation for the purpose of review and documentation for other batman-adv devs - and we don't expect that they all fall on the head at the same time. They are meant to describe the concept and not the actual implementation with all their nasty details.
On Wed, Mar 07, 2012 at 11:18:48PM +0100, Gabriel Kerneis wrote:
[CC: b.a.t.m.a.n@lists.open-mesh.org, see note 3 in particular]
Antonio,
On Wed, Mar 07, 2012 at 06:17:52PM +0100, Antonio Quartulli wrote:
Technical details about what? Interface-alternating? It is there! Gabriel wrote the link.
No. Please re-read my email carefuly. The wiki contains a rough explanation of the general principle (ie. “same interface = bad, different interface = good”). Not the actual algorithm used by batman-adv (quoting from the wiki: “the algorithm tries to avoid forwarding packets on the interface which just received the packet”).
Note that the wiki has been updated since then, by Simon with a few more details [1], and by Marek with benchmark results from WBMv3.
Maybe "algorithm" is a big word for a little feature like that. The bonding and interface alternating basically work in two steps:
1) detect that a neighbor is reachable via two different links 2) use the two different links for various manipulations (bonding, interface alternation)
1) The detection part is batman-specific, we use the the PRIMARIES_FIRST_HOP flag to do that. As a reminder (that might be documented somewhere else):
* OGMs from the primary interface are broadcasted on ALL interfaces and are spread over the mesh (big TTL) --> these get the PRIMARIES_FIRST_HOP flag, which is cleared when forwarded by other nodes * OGMs from the secondary interfaces are only broadcasted on their respective interface and are only used for local link sensing (TTL = 1)
When we receive OGMs with PRIMARIES_FIRST_HOP flags on different interfaces, we know that it came from the same neighbor, just from different interfaces. We have two links to this neighbor.
2) the manipulation step is independent of the routing protocol, as long as the routing protocol routes packets based on their destination and does not care about on which interface it comes in.
Because we already made our routing decision (we have chosen a neighbor), it does not matter on which link we send the frame. We use this freedom to either use another interface where the frame came in (interface alternation) or round-robin over the available, detected links (bonding). Note that this would work on any routing protocol and is independent of the BATMAN routing.
However, we need the fact that we are on layer 2 and can decide on the packet link usage in batman-adv. This would not work so easily with static layer 3 routing tables, I suppose.
Gabriel said he has not enough time to look into it. I'm sorry, but I don't think this is a good reason to blame batman-adv devs :P
I finally decided to settle this issue and spent my breakfast reading batman-adv/routing.c [2] instead of my favorite newspaper. Here is what I understood:
At all times, batman-adv maintains a list of "bonding candidates" for each node (bonding_candidate_add, called from bat_iv_ogm.c:699). Some node "neigh" is a bonding candidate for another node "orig" if and only if: - neigh and orig have the same primary address, ie. are in fact the same router,
that's right - we are talking about one neighbor, and the bonding candidates are the available links to this neighbor.
- the links to reach them have the same quality up to some additive constant (BONDING_TQ_THRESHOLD = 50) [3],
Yep, it would be useless if we can reach one link perfectly and the other one is dropping all the packets. We want similar TQ quality.
- orig does not already have another bonding candidate for the same interface, because it could interfere – but what if the neigh has a better link quality, isn’t it a pity to ignore it?
If it had a better quality, it would have been chosen as router already - at least we expect that here. Maybe this is a little rough, but using the same interface/frequency is far worse, IMHO.
Then, assuming that "interface alternating" is enabled, the list of bonding candidates is used on every route selection (find_ifalter_router, called from routing.c:769).
Thats right. Interface alternating is always enabled, BTW.
More precisely, once batman has chosen a next-hop router for a packet based on its classical routing algorithm, it walks the list of the bonding candidates associated to the primary interface for this router [4]. It selects the actual next-hop on the following criteria: - it must not be on the same interface as the packet came in, - its quality must be as high as possible (given the previous constraint).
This is the kind of explanation I would have loved to find on the wiki. By the way, consider it public domain and feel free to copy/paste/correct it if you wish.
Thanks for sharing your explanation. I will happily include it on the rework of this section.
It is still not clear to me exactly why this works, but I believe this is what the code does, and is definitely easier to discuss than generic, unsubstantiated claims.
Best regards, Gabriel
[1] “Interface alternating is only performed if the two candidate links to the next hop have a similar quality.” http://www.open-mesh.org/wiki/batman-adv/Multi-link-optimize
[2] http://www.open-mesh.org/projects/batman-adv/repository/revisions/master/ent...
[3] By the way, there is something I don’t understand: neigh_node->tq_avg will be accepted event if it is far greater than router->tq_avg + BONDING_TQ_THRESHOLD. Shouldn’t it be: abs(neigh_node->tq_avg - router->tq_avg) > BONDING_TQ_THRESHOLD? http://www.open-mesh.org/projects/batman-adv/repository/revisions/master/ent...
We expect that router->tq_avg is already the highest, so neigh_node->tq_avg shouldn't be (far) higher than router->tq_avg.
[4] Why the primary and not the chosen router directly? Is the bonding candidates list always associated to the primary interface?
We might have chosen the originator of a secondary interface, but should also have the originator of the primary interface (as explained above, we receive this over the secondary interfaces as well). The primary orig will have all neighbors from secondary interfaces as well, and yes, the bonding candidates are only associated to this primary originator (to avoid duplication of the same information), so this is the proper originator to choose for bonding/alternation. This is merely a implementation issue, and does not change the routing decision.
Thanks again for your comments - I'll notify you when we have updated the protocol documentation for your review, if thats okay?
Cheers, Simon