Hi Marek, Simon
I'm writing a protocol dissector for tcpdump which understands
batman-adv packets. To do this i need to use packet.h, at least that
is the easiest way to do it. tcpdump uses the BSD license, where as
packet.h is GPL2. So it is unlikely the tcpdump maintainers would
accept packet.h as is.
As the two copyright owners of this file, is it O.K. with you to if i
change the license on this one file in my tcpdump patch to use BSD?
Thanks
Andrew
Hi Marek
I finally had time to dig into our problems with loops in our chain.
Some background for the list. Yang and I have been using User Mode
Linux (UML) to build a test network for batman advanced. We connect a
number of uml machines together using a modified version of
uml_switch. The modifications allow us to change the packet drop
probability between any two nodes. We have been testing using simple
chains as shown in the attached gif. The black lines show the
currently used links. The red lines are other links which are
currently not used by batman. The black links have a packet drop
probablilty of 0% and the red of 20%.
Our test was to remove uml5 from the network and see how long
batman-adv took to re-route around it. We ping from uml4 to uml6 and
from uml1 to uml9.
We found that uml4->uml6 would recover in around 14 seconds. However
uml1->uml9 took much longer, 65 seconds.
Looking at the routing, we found it went into loops. When sending from
uml1 to uml9, uml1 routes to uml2, uml2 routes back to uml1.
Here are the logs from uml2. I've cut out most of the packets, just
showing OGMs from uml9. There is a simple relationship between the MAC
address and the uml number:
fe:fe:00:00:01:01 - uml1
fe:fe:00:00:02:01 - uml2
fe:fe:00:00:03:01 - uml3
etc...
[ 42949558] Received BATMAN packet via NB: fe:fe:00:00:03:01, IF: eth1 [fe:fe:00:00:02:01] (from OG: fe:fe:00:00:09:01, via old OG: fe:fe:00:00:04:01, seqno 146, tq 218, TTL 44, V 7, IDF 0)
[ 42949558] bidirectional: orig = fe:fe:00:00:09:01 neigh = fe:fe:00:00:03:01 => own_bcast = 64, real recv = 64, local tq: 255, asym_penalty: 255, total tq: 218
[ 42949558] update_originator(): Searching and updating originator entry of received packet
[ 42949558] Updating existing last-hop neighbour of originator
[ 42949558] Drop packet: duplicate packet received
This has been received from uml3 origionally from uml4. The TQ is 218
to uml9 via uml3.
[ 42949559] Received BATMAN packet via NB: fe:fe:00:00:01:01, IF: eth1 [fe:fe:00:00:02:01] (from OG: fe:fe:00:00:09:01, via old OG: fe:fe:00:00:03:01, seqno 146, tq 209, TTL 42, V 7, IDF 0)
[ 42949559] bidirectional: orig = fe:fe:00:00:09:01 neigh = fe:fe:00:00:01:01 => own_bcast = 64, real recv = 64, local tq: 255, asym_penalty: 255, total tq: 209
[ 42949559] update_originator(): Searching and updating originator entry of received packet
[ 42949559] Updating existing last-hop neighbour of originator
[ 42949559] Drop packet: duplicate packet received
This is where is starts to get interesting. This is from uml1,
origionally from uml3. So it has jumped uml2, it used the 20% packet
drop link which exists between uml1 and uml3. Because this is not an
echo, uml2 processes it, and now knows that with a TQ of 209 it can
get to uml9 via uml1.
[ 42949559] Sending own packet (originator fe:fe:00:00:02:01, seqno 155, TQ 255, TTL 50, IDF off) on interface eth1 [fe:fe:00:00:02:01]
[ 42949559] Forwarding aggregated packet (originator fe:fe:00:00:06:01, seqno 152, TQ 232, TTL 46, IDF off) on interface eth1 [fe:fe:00:00:02:01]
[ 42949559] Forwarding aggregated packet (originator fe:fe:00:00:09:01, seqno 146, TQ 215, TTL 43, IDF off) on interface eth1 [fe:fe:00:00:02:01]
[ 42949559] Forwarding packet (originator fe:fe:00:00:01:01, seqno 156, TQ 250, TTL 49, IDF on) on interface eth1 [fe:fe:00:00:02:01]
[ 42949560] Received BATMAN packet via NB: fe:fe:00:00:03:01, IF: eth1 [fe:fe:00:00:02:01] (from OG: fe:fe:00:00:09:01, via old OG: fe:fe:00:00:04:01, seqno 148, tq 150, TTL 45, V 7, IDF 0)
[ 42949560] updating last_seqno: old 146, new 148
[ 42949560] bidirectional: orig = fe:fe:00:00:09:01 neigh = fe:fe:00:00:03:01 => own_bcast = 64, real recv = 64, local tq: 255, asym_penalty: 255, total tq: 150
[ 42949560] update_originator(): Searching and updating originator entry of received packet
[ 42949560] Updating existing last-hop neighbour of originator
[ 42949560] Changing route towards: fe:fe:00:00:09:01 (now via fe:fe:00:00:01:01 - was via fe:fe:00:00:03:01)
[ 42949560] Forwarding packet: rebroadcast originator packet
[ 42949560] Forwarding packet: tq_orig: 150, tq_avg: 209, tq_forw: 204, ttl_orig: 44, ttl_forw: 255
Now things go none optimal :-(
This is from uml3, origionally from uml4. The TQ value has dropped to
150. This will be when we have removed uml5, so the TQ naturally does
drop.
The TQ value via uml3 is now less than the TQ value via uml1. So it
changes its route to go via uml1.
Looking at the logs of uml1, uml1 is always routing to uml9 via uml2.
The problem here i think is to do with the asymetric links algorithms.
When sending out an OGM, the node uses the TQ for its best link to the
originator, not the link the OGM came in on. If the OGM from uml1
origionally from UML3 reported the TQ via that route, the TQ would
very likely be lower. uml2 would then not of choosen to swap to
uml1. However, uml1 reports its best route, which is via uml2. uml2
does not know this, decides to use uml1, and we have a loop.
Does this all hang together correctly? I'm i interpreting this all
right...
How would you suggest fix this?
Thanks
Andrew
Hi there,
I'm having some trouble with the vis-output in batman-adv 0.2 beta
- compatibility-version 7. It seems like the 'From' part gets
substituted with the mac address of a node's primary interface,
while the 'to' part does not. The result is, that when I try to
convert the dot-files to png/svg/etc. with neato from the
graphviz toolchain, I get two seperate graphs.
Of course, the easiest way might be to substitue the 'to' part
with the mac addresses of the primary interfaces as well, but I'd
prefer something differently:
http://www.graphviz.org/Gallery/directed/cluster.html
Those subgraph things from the dot-file-format would be a great
way, to have a more detailed image of how two batman-nodes are
connected, over which interfaces they're connected with each
other.
I had a little look at the batman-adv source code and noticed that
the structure vis_info_entry in vis.h does only have a dest and
not a src entry. So the vis-server only gets to know, the node's
primary mac address but a primary or secondary mac in dest so far,
right?
I'm also still trying to figure out, where such a src information
could come from... dest gets that stuff from orig_node->orig and
there is a list of interfaces in orig_node->batman_if and also
a neighbour list in orig_nodes->router. What would have to be used
for the src address?
I'm also wondering, what those hashing functions are for exactly.
Would I break some of that stuff by introducing a src address in
the vis-packet?
I'm not that much into reading the batman-adv source code yet, but
I'd like to get a little more familiar with it. So please be
considerate :).
Cheers, Linus
PS: I'm having a router with vlans here, so two seperate
LAN-ports, which batman is using, are having the same mac-address.
Could that cause any trouble?
Hi,
>I have 3 linksys wrt54 flashed with openwrt 7.09, webif'ed and installed
> BATMAN 0.2-rv478.1, all OK I think.
not sure what web interface you installed but are you aware of the fact that
revision 478 is more than 2 years old ? Your current problem seems unrelated
to batman but if you experience trouble / bugs with batman it might be wise to
upgrade to the latest stable release.
>From the ssh prompt on .3 I can ping .2 with no loss and ping .1 with ~ 6%
> loss. However, if I try to ping .1 from the pc connected on the ethernet
> port of .3 I get no route to host. Putting wireshark on the ethernet ifc
> shows the arp going out but nothing coming back. Interestingly enough if I
> ssh into .3 and ping .1, with wireshark running on the ethernet of .3 Is
> see the outgoing ICMP but not the reply (which is there because the ping
> succeeds) . It seems as if the local host traffic is not truly bridged.
You bridged the LAN & WLAN on every device but the devices are not bridged
with each other, hence you mix bridges & routing.
What IP address did you assign to your PC ? Does your PC have a route towards
the .1 (probably not) ? You also have make sure that all nodes have a route
towards your PC. You can achieve that using HNA. See:
http://www.open-mesh.net/wiki/AnnouncingNetworks for a detailed explanation.
Regards,
Marek