fiiinally got back home, a week ago, and got time to debug a strange
issue here. The report i had from a few users was "intermittent
connectivity", with "waves" of traffic, with random pauses lasting from
a few seconds to a minute or so.
I initially dismissed as interference, or even OS problems, but turns
out they were right! and sadly, batman seems to be in the way
From what i've seen, watching "batctl tg -w" on every node along the
way, i could determine the window of time where the traffic gets lost:
from the moment when there's a TT change on one side of the network,
to the moment that change is propagated to the other side.
By ordex's advice, i ran some "batctl ll tt ; batctl l" along the way
and i'm sending the pastebin results at the end of this mail.
Some (hopefully) useful context follows, and a batctl vd graph is attached
The IPv6 of tdorado is pinged (to rule out DAT interactions) from
labanda-este (works fine always) and from labanda-oeste (suffers the
issue, as well as all nodes "behind" it, i.e. casapuente & alfredo).
both labandas are tl-wdr3500 connected by 2.4ghz, 5ghz, and an ethernet
cable. The ethernet carries only batadv packets (eth0.1 is added to
bat0); there's no "lan backbone" (the eth0.2 that appears under br-lan
is not connected to anything)
root@tdorado:~# opkg list kmod-batman-adv # same in all nodes
kmod-batman-adv - 3.8.3+2013.2.0-2
root@tdorado:~# ip a s br-lan
6: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 64:70:02:3d:a0:f7 brd ff:ff:ff:ff:ff:ff
inet6 2a00:1508:1:f004:6670:2ff:fe3d:a0f7/64 scope global dynamic
valid_lft 6985sec preferred_lft 6985sec
root@tdorado:~# batctl tl -n |grep f7
* 64:70:02:3d:a0:f7 [.....] 1.140
root@tdorado:~# batctl o |head -n 1
[B.A.T.M.A.N. adv 2013.2.0, MainIF/MAC: wlan0-1/66:70:02:3d:a0:f9 (bat0)]
Both ping6s were started at the same time, so the seq numbers are
synchronized, and can be used as timestamps.
the "gap" in labanda-oeste is between seq=73 and seq=89
in labanda-oeste there were no messages or traffic for 25secs, and then
the "TT inconsistency" came up, resolved, and seq=89 succeded, traffic
at that time, seq=74, labanda-este got a TT update:
[ 23161800] Deleting tdorado from global tt entry 44:d8:84:b0:d2:f5: tt
removed by changes
and (AFAIU) dropped traffic coming from labanda-oeste until
labanda-oeste finally got the TT update and increased the ttvn to 129
does any of this make sense? I imagine a tcpdump would help, so i'll try
getting one, but maybe this debug was enough to get an insight?
As you can imagine, any further pointer will be greatly appreciated,
I hope you're having a great week, ...and that i'm not ruining it as