Re: [B.A.T.M.A.N.] [Battlemesh] Battlemesh v5 tests

9 Mar 2012

      Hey Gabriel,
thanks for bringing the discussion to the batman ml and giving some constructive
input. I've written this bonding/alternating feature some time ago, and we released
it at WBMv3 together with this little documentation to be found in the wiki. Actually,
I considered the feature rather simple and therefore I did not write too much about it
- because there is not really much to write about, or so I thought. Obviously, there
were some things unclear, so thanks for pointing me/us to that.
When implementing, it is easy to miss some things that are not that obvious
for outsiders, so please feel free to ask or suggest things. We'll rework the
bonding/interface alternating part in the next days, and would be
happy to include your suggestions. :)
Usually, we create the protocol documentation for the purpose of review and
documentation for other batman-adv devs - and we don't expect that they all
fall on the head at the same time. They are meant to describe the concept and
not the actual implementation with all their nasty details.
On Wed, Mar 07, 2012 at 11:18:48PM +0100, Gabriel Kerneis wrote:
...
[CC: b.a.t.m.a.n@lists.open-mesh.org, see note 3 in particular]
Antonio,
On Wed, Mar 07, 2012 at 06:17:52PM +0100, Antonio Quartulli wrote:
...
Technical details about what? Interface-alternating? It is there!
Gabriel wrote the link.
No. Please re-read my email carefuly.  The wiki contains a rough explanation of
the general principle (ie. “same interface = bad, different interface = good”).
Not the actual algorithm used by batman-adv (quoting from the wiki: “the
algorithm tries to avoid forwarding packets on the interface which just received
the packet”).
Note that the wiki has been updated since then, by Simon with a few more
details [1], and by Marek with benchmark results from WBMv3.
Maybe "algorithm" is a big word for a little feature like that. The bonding
and interface alternating basically work in two steps:
1) detect that a neighbor is reachable via two different links
 2) use the two different links for various manipulations (bonding, interface alternation)
1) The detection part is batman-specific, we use the the PRIMARIES_FIRST_HOP flag
to do that. As a reminder (that might be documented somewhere else):
* OGMs from the primary interface are broadcasted on ALL interfaces and are spread over
   the mesh (big TTL) --> these get the PRIMARIES_FIRST_HOP flag, which is cleared
   when forwarded by other nodes
 * OGMs from the secondary interfaces are only broadcasted on their respective interface
   and are only used for local link sensing (TTL = 1)
When we receive OGMs with PRIMARIES_FIRST_HOP flags on different interfaces, we know
that it came from the same neighbor, just from different interfaces. We have two
links to this neighbor.
2) the manipulation step is independent of the routing protocol, as long as the routing
protocol routes packets based on their destination and does not care about on which
interface it comes in.
Because we already made our routing decision (we have chosen a neighbor), it does not
matter on which link we send the frame. We use this freedom to either use another
interface where the frame came in (interface alternation) or round-robin over the
available, detected links (bonding). Note that this would work on any routing protocol
and is independent of the BATMAN routing.
However, we need the fact that we are on layer 2 and can decide on the packet link usage
in batman-adv. This would not work so easily with static layer 3 routing tables, I suppose.
...
...
Gabriel said he has not enough time to look into it. I'm sorry, but I don't think
this is a good reason to blame batman-adv devs :P
I finally decided to settle this issue and spent my breakfast reading
batman-adv/routing.c [2] instead of my favorite newspaper.  Here is what I
understood:
At all times, batman-adv maintains a list of "bonding candidates" for each
node (bonding_candidate_add, called from bat_iv_ogm.c:699).
Some node "neigh" is a bonding candidate for another node "orig" if and only
if:
- neigh and orig have the same primary address, ie. are in fact the same
  router,

that's right - we are talking about one neighbor, and the bonding candidates are the
available links to this neighbor.
...
- the links to reach them have the same quality up to some additive
  constant (BONDING_TQ_THRESHOLD = 50) [3],

Yep, it would be useless if we can reach one link perfectly and the other one
is dropping all the packets. We want similar TQ quality.
...
- orig does not already have another bonding candidate for the same
  interface, because it could interfere – but what if the neigh has a better
  link quality, isn’t it a pity to ignore it?

If it had a better quality, it would have been chosen as router already - at least
we expect that here. Maybe this is a little rough, but using the same interface/frequency
is far worse, IMHO.
...
Then, assuming that "interface alternating" is enabled, the list of bonding
candidates is used on every route selection (find_ifalter_router, called
from routing.c:769).

Thats right. Interface alternating is always enabled, BTW.
...
More precisely, once batman has chosen a next-hop router for a packet based
on its classical routing algorithm, it walks the list of the bonding
candidates associated to the primary interface for this router [4].  It
selects the actual next-hop on the following criteria:
- it must not be on the same interface as the packet came in,
- its quality must be as high as possible (given the previous constraint).

This is the kind of explanation I would have loved to find on the wiki.  By the
way, consider it public domain and feel free to copy/paste/correct it if you
wish.
Thanks for sharing your explanation. I will happily include it on the rework of
this section.
...
It is still not clear to me exactly why this works, but I believe this is what
the code does, and is definitely easier to discuss than generic, unsubstantiated
claims.
Best regards,
Gabriel
[1] “Interface alternating is only performed if the two candidate links to the
    next hop have a similar quality.”
    http://www.open-mesh.org/wiki/batman-adv/Multi-link-optimize
[2] http://www.open-mesh.org/projects/batman-adv/repository/revisions/master/ent...
[3] By the way, there is something I don’t understand: neigh_node->tq_avg will be
    accepted event if it is far greater than router->tq_avg + BONDING_TQ_THRESHOLD.
    Shouldn’t it be: abs(neigh_node->tq_avg - router->tq_avg) > BONDING_TQ_THRESHOLD?
    http://www.open-mesh.org/projects/batman-adv/repository/revisions/master/ent...
We expect that router->tq_avg is already the highest, so neigh_node->tq_avg shouldn't
be (far) higher than router->tq_avg.
...
[4] Why the primary and not the chosen router directly? Is the bonding
    candidates list always associated to the primary interface?
We might have chosen the originator of a secondary interface, but should also
have the originator of the primary interface (as explained above, we receive
this over the secondary interfaces as well). The primary orig will have all
neighbors from secondary interfaces as well, and yes, the bonding candidates are
only associated to this primary originator (to avoid duplication of the same 
information), so this is the proper originator to choose for bonding/alternation.
This is merely a implementation issue, and does not change the routing
decision.
Thanks again for your comments - I'll notify you when we have updated 
the protocol documentation for your review, if thats okay?
Cheers,
    Simon

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [B.A.T.M.A.N.] [Battlemesh] Battlemesh v5 tests