Hi i'm running bmxd_rv972 for almost three months now, without any mayor problems :-)... thanks. what i recognized lately, on two nodes (the ones with more than one bmx interface), show alternativeNextHops, which are *NO* alternative. It seems there gets some 'information' lost between the two IFs. I suppose that's a known issue ;-), any ideas how to fix/workaround ? iirc, i saw once or twice a node "on the other interface" even listed as the bestNextHop.
cheers
--Jan
Can you describe a little more extensively your network configuration?
On Mon, Apr 21, 2008 at 2:39 PM, Jan Hetges tran@ms20.net wrote:
Hi i'm running bmxd_rv972 for almost three months now, without any mayor problems :-)... thanks. what i recognized lately, on two nodes (the ones with more than one bmx interface), show alternativeNextHops, which are *NO* alternative. It seems there gets some 'information' lost between the two IFs. I suppose that's a known issue ;-), any ideas how to fix/workaround ? iirc, i saw once or twice a node "on the other interface" even listed as the bestNextHop.
cheers
--Jan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIDIsAlTtvZdk47D4RAuzTAJ4kaM8igcwMFyGm0VME14nnBR2rIgCeL7WL vgm3/cHM8tR7/g8q0NkGPiU= =40m+ -----END PGP SIGNATURE-----
B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n
On Mon, Apr 21, 2008 at 02:57:44PM +0200, Benjamin Henrion wrote:
On Mon, Apr 21, 2008 at 2:39 PM, Jan Hetges tran@ms20.net wrote:
i'm running bmxd_rv972 for almost three months now, without any mayor problems :-)... thanks. what i recognized lately, on two nodes (the ones with more than one bmx interface), show alternativeNextHops, which are *NO* alternative. It seems there gets some 'information' lost between the two IFs. I suppose that's a known issue ;-), any ideas how to fix/workaround ? iirc, i saw once or twice a node "on the other interface" even listed as the bestNextHop.
Can you describe a little more extensively your network configuration?
sorry ;-):
x.x.0.0/24---x.x.0.1/x.x.3.1---x.x.3.0/24---x.x.3.160/x.x.4.1---x.x.4.0/24 | internet
batman network x.x.0.0/20. .0.1/.3.1 is one node with two radios, and .3.160/.4.1 the other one. and, .3.160/.4.1 lists .4.162 as alternativeNextHop to .3.128. where .3.1 and .3.137 can see .3.128, but .4.162 can not. So, there is a real alternativeNextHop to .3.128 ... .3.137, which is listed after .4.162 on .3.160/.4.1. I attach the output of bmxd -cbd8. But after i really think through it, it shouldn't really matter, because, if the connection between .3.160 and .3.1 gets interrupted, .4.162 should probably fall back behind .3.137 in .3.160's statistic.
--Jan
Hi Jan,
On Montag 21 April 2008, Jan Hetges wrote:
On Mon, Apr 21, 2008 at 02:57:44PM +0200, Benjamin Henrion wrote:
On Mon, Apr 21, 2008 at 2:39 PM, Jan Hetges tran@ms20.net wrote:
i'm running bmxd_rv972 for almost three months now, without any mayor problems :-)... thanks. what i recognized lately, on two nodes (the ones with more than one bmx interface), show alternativeNextHops, which are *NO* alternative. It seems there gets some 'information' lost between the two IFs. I suppose that's a known issue ;-),
It is NOT a known issue (at least not to me) and if its a bug it should be fixed.
fix/workaround ? iirc, i saw once or twice a node "on the other interface" even listed as the bestNextHop.
Can you describe a little more extensively your network configuration?
sorry ;-):
x.x.0.0/24---x.x.0.1/x.x.3.1---x.x.3.0/24---x.x.3.160/x.x.4.1---x.x.4.0/24
One general questions: do all your interfaces operate on the same frequency?
Then, I am unsure about the netmasks you are using. The above line I understand that you are using x.x.3.0/24 and x.x.4.0/24 netmasks. Generally, It is strongly recommended to always use the same netmask on ALL batman interfaces ! In your case there is: x.x.3.160/24 for one interface on the 3.160 node which should result in a broadcast address of x.x.3.255. But according to the debug output there is a direct link to x.x.4.165/24 (which I guess has broadcast address of x.x.4.255) and therefore the two interfaces should not see each other!!?? Can you verify that (note that the "ifconfig dev ip/netmask" command is buggy and does not always produce corresponding netmask and broadcast addresses and that the interfaces MUST be configured appropriately before the daemon is started. !!!)
.0.1/.3.1 is one node with two radios, and .3.160/.4.1 the other one. and, .3.160/.4.1 lists .4.162 as alternativeNextHop to .3.128. where .3.1 and .3.137 can see .3.128, but .4.162 can not.
An alternativeNextHop to a specific node must not necessarily be a direct neighbor of that node. For example in the following scenario:
A---B---D | | +---E---+
From As' point of view B and E may both be potential next hops towards D. But only E
can directly see D.
Is it possible to generate (almost simultaneously) -cbd8 logs from the involved nodes, especially 3.1, 3.160, 4.162, 3.128.
So, there is a real alternativeNextHop to .3.128 ... .3.137, which is listed after .4.162 on .3.160/.4.1. I attach the output of bmxd -cbd8.
The attached debug log shows: 172.19.3.128 wlan0:bmx 172.19.3.1 80 ( 97 1:01:20:33 15813 0 100 1012 18 2 1 ) 172.19.4.162 67 172.19.3.140 2
At least this line does not show 3.137 listed after .4.162 on .3.160/.4.1 Has it been truncated ??
Looking forward to solve this... best regards, axel
On Tue, Apr 22, 2008 at 04:01:57PM +0200, Axel Neumann wrote:
Hi Jan,
On Montag 21 April 2008, Jan Hetges wrote:
On Mon, Apr 21, 2008 at 02:57:44PM +0200, Benjamin Henrion wrote:
On Mon, Apr 21, 2008 at 2:39 PM, Jan Hetges tran@ms20.net wrote:
what i recognized lately, on two nodes (the ones with more than one bmx interface), show alternativeNextHops, which are *NO* alternative. It seems there gets some 'information' lost between the two IFs. I suppose that's a known issue ;-),
It is NOT a known issue (at least not to me) and if its a bug it should be fixed.
so, let's fix it then ;-)
Can you describe a little more extensively your network configuration?
x.x.0.0/24---x.x.0.1/x.x.3.1---x.x.3.0/24---x.x.3.160/x.x.4.1---x.x.4.0/24
One general questions: do all your interfaces operate on the same frequency?
no
Then, I am unsure about the netmasks you are using. The above line I understand that you are using x.x.3.0/24 and x.x.4.0/24 netmasks.
sorry, the /24 only specifys the netrange for the ssid it belongs to
Generally, It is strongly recommended to always use the same netmask on ALL batman interfaces !
i know, you helped me fixing that :-) and if you look again into my previous mail, i also wrote: batman network x.x.0.0/20. so, ALL batman broadcasting to x.x.15.255,
But according to the debug output there is a direct link to x.x.4.165/24 (which I guess has broadcast address of x.x.4.255) and therefore the two interfaces should not see each other!!??
there are direct links always only in the according /24 netrange, because they are on different channels/ssids, so .3.160 and .4.1 cannot "see" each other. Note, NO .4.x node cannot see ANY .3.x node, even if they would be on the same channel/ssid!
Can you verify that (note that the "ifconfig dev ip/netmask" command is buggy and does not always produce corresponding netmask and broadcast addresses and that the interfaces MUST be configured appropriately before the daemon is started. !!!)
i recognized the buggy ifconfig and ALWAYS set netmask and broadcast addresses.
.0.1/.3.1 is one node with two radios, and .3.160/.4.1 the other one. and, .3.160/.4.1 lists .4.162 as alternativeNextHop to .3.128. where .3.1 and .3.137 can see .3.128, but .4.162 can not.
An alternativeNextHop to a specific node must not necessarily be a direct neighbor of that node. For example in the following scenario:
claro
A---B---D | | +---E---+
From As' point of view B and E may both be potential next hops towards D. But only E
can directly see D.
Is it possible to generate (almost simultaneously) -cbd8 logs from the involved nodes, especially 3.1, 3.160, 4.162, 3.128.
attached, note that .3.137 is down due to power issues, and the link between .3.140 and .3.160 is not usable, but you should see the issue.
So, there is a real alternativeNextHop to .3.128 ... .3.137, which is listed after .4.162 on .3.160/.4.1. I attach the output of bmxd -cbd8.
The attached debug log shows: 172.19.3.128 wlan0:bmx 172.19.3.1 80 ( 97 1:01:20:33 15813 0 100 1012 18 2 1 ) 172.19.4.162 67 172.19.3.140 2
right, but it SHOULD show .3.137, and NOT .4.162 at all. The ONLY link between .4.x and .3.x IS .4.1/.3.160 (that's the reason for having a repeater there :)
At least this line does not show 3.137 listed after .4.162 on .3.160/.4.1 Has it been truncated ??
i don't think so, i'll make some more logs when .3.137 is back up
Looking forward to solve this...
cheers
--Jan
Hey,
But according to the debug output there is a direct link to x.x.4.165/24 (which I guess has broadcast address of x.x.4.255) and therefore the two interfaces should not see each other!!??
there are direct links always only in the according /24 netrange, because they are on different channels/ssids, so .3.160 and .4.1 cannot "see" each other. Note, NO .4.x node cannot see ANY .3.x node, even if they would be on the same channel/ssid!
Are you also specifying the bssid? Because in order to define different cells in an ad-hoc network the ssid is almost useless in most adhoc implementations. THe BSSID is much more important! (E.g. use kamikaze-etc/config/wireless style: option bssid 44:ca:ff:ee:ba:be or the command-line-style: iwconfig wlan0 ap 44:ca:ff:ee:ba:be )
Anyway, you mean a debug output on 3.16 like the following can not be correct? Neighbor outgoingIF bestNextHop brc (rcvd knownSince lseq lvld rid sid ) [ viaIF RTQ RQ TQ].. 172.19.4.165 wlan0:bmx 172.19.4.165 55 ( 84 0:00:28:03 9777 1 1 4 ) [ wlan0:bmx 46 76 60]
Sorry for being stubborn: what makes you so sure that they cannot see each other or that the driver does not mix things up?
At least from the madwifi driver (in ad-hoc mode) I know that it tends to ignore its assigned *bssid* and channel and switches back and forth between the assigned one and others.
Sometimes the driver hangs on a wrong bssid and channel, forwards the wrong IP packets, and cannot receive any packets from its originally assigned bssid/channel.
One way to verify this would be to run a tcpdump on interface 3.160/wlan0 and see if it ever tracks a batman packet with a src address of e.g. 172.19.4.165
i'll make some more logs when .3.137 is back up
ok.
ciao, axel
Hi Axel thanks for being stubborn :-) On Wed, Apr 23, 2008 at 09:54:37AM +0200, Axel Neumann wrote:
But according to the debug output there is a direct link to x.x.4.165/24 (which I guess has broadcast address of x.x.4.255) and therefore the two interfaces should not see each other!!??
there are direct links always only in the according /24 netrange, because they are on different channels/ssids, so .3.160 and .4.1 cannot "see" each other. Note, NO .4.x node cannot see ANY .3.x node, even if they would be on the same channel/ssid!
ok, .4.165 can see .3.160
Are you also specifying the bssid? Because in order to define different cells in an ad-hoc network the ssid is almost useless in most adhoc implementations. THe BSSID is much more important! (E.g. use kamikaze-etc/config/wireless style: option bssid 44:ca:ff:ee:ba:be or the command-line-style: iwconfig wlan0 ap 44:ca:ff:ee:ba:be )
thanks, did'nt know that ^^
Anyway, you mean a debug output on 3.16 like the following can not be correct? Neighbor outgoingIF bestNextHop brc (rcvd knownSince lseq lvld rid sid ) [ viaIF RTQ RQ TQ].. 172.19.4.165 wlan0:bmx 172.19.4.165 55 ( 84 0:00:28:03 9777 1 1 4 ) [ wlan0:bmx 46 76 60]
did'nt see that ^^^^^
Sorry for being stubborn: what makes you so sure that they cannot see each other or that the driver does not mix things up?
At least from the madwifi driver (in ad-hoc mode) I know that it tends to ignore its assigned *bssid* and channel and switches back and forth between the assigned one and others.
Sometimes the driver hangs on a wrong bssid and channel, forwards the wrong IP packets, and cannot receive any packets from its originally assigned bssid/channel.
One way to verify this would be to run a tcpdump on interface 3.160/wlan0 and see if it ever tracks a batman packet with a src address of e.g. 172.19.4.165
ok, what happend: .4.165 (i386/prism2.5/hostap(-pci)) switched for some unknown reason to .3.x/channel/essid (which explains the poor link from .4.165 to ".4.1"(actually .3.160), and i thought it was a fast growing Guahumo tree :). So i fixed the bssids on both, .4.1 and .4.165. And it seems to not switch anymore... thanks for being so stubborn :-) But that doesn't change the fact, that .4.1 still shows .4.162 as alternativeNextHop to .3.x, where, in fact, all alternative paths inside .4.x to .3.x lead back to .4.1 !
i'll make some more logs when .3.137 is back up
still down, anyways, i attach -cbd8 from .4.1, .4.160, .4.162 and .4.165
cheers
--Jan
Hi,
ok, what happend: .4.165 (i386/prism2.5/hostap(-pci)) switched for some unknown reason to .3.x/channel/essid (which explains the poor link from .4.165 to ".4.1"(actually .3.160), and i thought it was a fast growing Guahumo tree :). So i fixed the bssids on both, .4.1 and .4.165. And it seems to not switch anymore...
So lets hope that the prism2.5 behavior is more predictable than madwifi
thanks for being so stubborn :-) But that doesn't change the fact, that .4.1 still shows .4.162 as alternativeNextHop to .3.x, where, in fact, all alternative paths inside .4.x to .3.x lead back to .4.1 !
Yea, thats correct. Its not a bug, its a design thing of the batman protocol. 3.1 generates OGMs which are rebroadcasted via 4.1. Then, they are received by 4.162 and 4.160 which rebroadcast them again. Consequently, 4.162 and 4.160 will both receive 3.1-OGMs from each other.
3.x----4.1--+- - -4.162-+ | | +---4.160---+
Now assuming, for some reasons the link between 4.1 and 4.162 is weak, then 4.162 might receive more OGMs via 4.160 then via 4.1. Consequently, 4.1 will receive the 3.x-OGMs broadcasted by 4.162 and consider 4.162 as an alternativeNextHop towards 3.x. The good thing is that the number of OGMs received via such a "wrong path" can NEVER become more (and faster) than those received via the "correct path" and therefore such path should never be chosen.
Theoretically, the same applies for a setup like this: 3.x----4.1---4.162 where 4.1 receives the 3.x-OGMs rebroadcasted by 4.162. But for paranoid reasons we included a mechanism which allows 4.1 to identify OGMs that travelled via itself just one hop ago.
happy ruminating,
axel
i'll make some more logs when .3.137 is back up
still down, anyways, i attach -cbd8 from .4.1, .4.160, .4.162 and .4.165
cheers
--Jan
On Wed, Apr 23, 2008 at 09:54:37AM +0200, Axel Neumann wrote: [..snip..]
i'll make some more logs when .3.137 is back up
ok.
there you go (to make the confusion complete, i named them like before ;-) . and something completely different: i think, .4.1 should choose the direkt link to .4.162 instead of going through .4.160 (and vice versa)
cheers
--Jan
Hi,
On Donnerstag 24 April 2008, Jan Hetges wrote:
and something completely different: i think, .4.1 should choose the direkt link to .4.162 instead of going through .4.160 (and vice versa)
I experienced the same. Sometimes, from the end2end throughput point of view, it would be better to choose the direct route and sometimes not. I played a lot with the metric in charge of the final route. But in the end I am unsure which parameters are responsible for that or whether the better path can be identified over several hops at all (perhaps if we take hop-by-hop bandwidth, interference, load, and other stuff into account). You could twist the metric so that it always prefers end2end routes with less hops but I think in the end it is most important to have a general and unique metric applied to every routing instance in your mesh.
Don't know if you want to hear it? You can play with --dups-ttl-degradation X (current default is X=2). This degrades the preference for a path by 2 percent with every additional hop (relative to the shortes path). So if you use 50 instead, a node with two alternative paths (one single-hop and one two-hop path) to a given destination will ignore 50% of the OGMs received via a two hop path. Then it will probably choose the single-hop path. And of course, if you find a value which works general better for all your nodes then let us know.
ciao, axel
cheers
--Jan
b.a.t.m.a.n@lists.open-mesh.org