[B.A.T.M.A.N.] batman-exp rev.1154 still using 94% CPU load

List overview All Threads
Download

newer

older

[B.A.T.M.A.N.] possible on any...

[B.A.T.M.A.N.] [PATCH] Change urls...

Stephan Enderlein (Freifunk Dresden)

2 Dec 2008 2 Dec '08

11:30 a.m.

Hi,

I have still the problem that batman-exp is hanging on 94% cpu load. perhaps it has nothing to do with the gateway task. It is possible that I run more "batmand -c" at same time? The current batman revision is 1154. Do you have any idea?

Mem: 15700K used, 14924K free, 0K shrd, 1440K buff, 6600K cached CPU: 5.8% usr 94.1% sys 0.0% nice 0.0% idle 0.0% io 0.0% irq 0.0% softirq Load average: 1.62 1.39 0.98 PID PPID USER STAT VSZ %MEM %CPU COMMAND 24256 1843 root R 1264 4.1 94.7 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 1156 1 root S 1264 4.1 0.0 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 1843 1156 root S 1264 4.1 0.0 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 24462 24460 root S 1216 3.9 0.0 batmand -cb -d2 24857 24815 root S 1216 3.9 0.0 batmand -c -b -r 1

--------------------------------------- Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

Show replies by date

Axel Neumann

2 Dec 2 Dec

12:06 p.m.

Hi,

On Dienstag 02 Dezember 2008, Stephan Enderlein (Freifunk Dresden) wrote:

...

Hi,

I have still the problem that batman-exp is hanging on 94% cpu load. perhaps it has nothing to do with the gateway task. It is possible that I run more "batmand -c" at same time?

yes, should be possible. I am often connecting different debug clients, also while changing for example the preferred gateway or the gateway-class.

Do you feel this problem has arised with a specific revision (has it been there with rv1069 and before) or has it always been there and your setup has changed?

...

The current batman revision is 1154. Do you have any idea?

Actually not. Unfortunately I'll be probably be offline during the next week and cant do much. There is a completely thread-free version waiting to be checked in, then we can see if this helps, But actually I would prefer to nail down the source of the problem...

cu, axel

...

Mem: 15700K used, 14924K free, 0K shrd, 1440K buff, 6600K cached CPU: 5.8% usr 94.1% sys 0.0% nice 0.0% idle 0.0% io 0.0% irq 0.0% softirq Load average: 1.62 1.39 0.98 PID PPID USER STAT VSZ %MEM %CPU COMMAND 24256 1843 root R 1264 4.1 94.7 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 1156 1 root S 1264 4.1 0.0 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 1843 1156 root S 1264 4.1 0.0 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 24462 24460 root S 1216 3.9 0.0 batmand -cb -d2 24857 24815 root S 1216 3.9 0.0 batmand -c -b -r 1

Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Stephan Enderlein (Freifunk Dresden)

1:59 p.m.

Hi axel,

...

Do you feel this problem has arised with a specific revision (has it been there with rv1069 and before) or has it always been there and your setup has changed?

The first time I saw this was on revistion 1105. But a similar problem was already on revision 972 where I still could call "batmand -c -r 3" but not with "-d...". I can not say if this is still the same problem.

My setup of compiling batmand was not changed. The compile flags I used were: CFLAGS = -Wall -O1 -DMEMORY_USAGE -DPROFILE_DATA -DDEBUG_MALLOC LDFLAGS = -lpthread

Today I changed to revision 1171 and use the following flags: CFLAGS = -Wall -O2 -g -DDEBUG_MALLOC -DMEMORY_USAGE -DPROFILE_DATA LDFLAGS = -lpthread

I compile batmand within the whiterussian_rc6 openwrt environment. I wanted to create a core file but it seems that the openwrt kernel does not support it. (ulimit -c unlimited, and kill -6 xxx)

...

Actually not. Unfortunately I'll be probably be offline during the next week and cant do much. There is a completely thread-free version waiting to be checked in, then we can see if this helps, But actually I would prefer to nail down the source of the problem...

When do you expect a thread-free version of the batman-experimental branch? I also like to solve such problems instead of using new code in hope that the problem is gone.

Bye Stephan --------------------------------------- Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

Stephan Enderlein (Freifunk Dresden)

9 Dec 9 Dec

9:28 a.m.

Hi again,

the batmand-exp is still hanging. below you see the top output and the parameters of batmand or hna. I also include the network setup and some last connection states. Perhaps it has something to do with any of the options. I can not directly verify if any option is the reason for this hanging. batman is connected via wlan (eth1) and a backbone connection (tbb). I have two router WRT54GL and WRT54GS and a vserver (i386). I only have problems on WRT54GS. Perhaps it has something to do with "-r 1" option.

Setup:

WRT54GL(eth1=wlan)----(eth1=wlan)WRT54GS(tbb=vpn tunnel)----(tbb)vserver

The vserver has also an addtional connection to a different router via tbb. The parameters of batmand are:

WRT54GL: /sbin/batmand -s 10.12.0.1 -a 10.12.10.0/28 -r 2 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A WRT54GS: /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A vserver: /usr/bin/batmand -a 195.42.115.56/32 -a 104.61.0.0/16 -a 105.61.0.0/16 -a 106.61.0.0/16 -s 10.12.0.1 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 wifi tbb /t 1 /i /A

----------- Another question: in previous versions I have seen that if two batmand announce the same HNA ip (-a) one of the batmand are ignored and complete ignored any batmand traffic. As result the node was removed from any batmand list and was not reachable anymore. -----------

top: --------------------- Mem: 15832K used, 14792K free, 0K shrd, 1440K buff, 5972K cached CPU: 0.0% usr 100% sys 0.0% nice 0.0% idle 0.0% io 0.0% irq 0.0% softirq Load average: 1.00 0.97 0.92 PID PPID USER STAT VSZ %MEM %CPU COMMAND 8380 1816 root R 1232 4.0 80.7 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 1151 1 root S 1232 4.0 0.0 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 1816 1151 root S 1232 4.0 0.0 /sbin/batmand -s 10.12.0.1 -a 10.12.10.16/28 -r 1 --t 63 --no-unreachable-rule --no-throw-rules --no-prio-rules --one-way-tunnel 1 --two-way-tunnel 0 eth1 tbb /t 1 /i /A 23934 23892 root S 1184 3.8 0.0 batmand -c -b -r 1 32567 32525 root S 1184 3.8 0.0 batmand -c -b -r 1 9105 9063 root S 1184 3.8 0.0 batmand -c -b -r 1 3421 3379 root S 1184 3.8 0.0 batmand -c -b -r 1 9038 9036 root S 1184 3.8 0.0 batmand -cb -d2 12030 11988 root S 1184 3.8 0.0 batmand -c -b -r 1 20666 20624 root S 1184 3.8 0.0 batmand -c -b -r 1

Last connection output from batmand: --------------------------- BatMan-eXp 0.3-alpha (compatibility version 10) ! BatMan-eXp 0.3-alpha, IF eth1 10.12.10.17, LinkWindowSize 100, PathWindSize 100, OGI 1000ms, currSeqno 34756, UT 0:07:12:09, CPU 3/1000, IntTime 25929690 Neighbor outgoingIF bestNextHop brc (~rcvd knownSince lseq lvld rid nid ) [ viaIF RTQ RQ TQ].. 10.12.10.1 eth1 10.12.10.1 87 ( 100 0:00:57:20 43345 0 1 1 ) [ eth1 79 100 79] 172.16.8.1 tbb 172.16.8.1 100 ( 99 0:00:56:50 18597 0 2 2 ) [ tbb 100 100 100] 172.16.0.1 tbb 172.16.0.1 100 ( 99 0:00:56:50 39090 0 1 3 ) [ tbb 100 100 100] 172.16.0.17 tbb 172.16.0.17 100 ( 99 0:00:56:49 10460 0 1 4 ) [ tbb 100 100 100]

Originator outgoingIF bestNextHop brc (~rcvd knownSince lseq lvld pws ogi cpu hop change ) alternativeNextHops brc ... 10.12.0.1 tbb 172.16.0.1 100 ( 99 0:00:56:48 39090 0 100 1015 1 1 5 ) 172.16.0.17 98 172.16.8.1 98 10.12.0.17 tbb 172.16.0.17 100 ( 99 0:00:56:48 10460 0 100 1008 3 1 11 ) 172.16.0.1 96 172.16.8.1 98 10.12.10.1 eth1 10.12.10.1 87 ( 100 0:00:57:20 43345 0 100 1008 1 -12 1 ) 172.16.0.17 0 172.16.0.1 0 10.12.8.1 tbb 172.16.8.1 99 ( 99 0:00:56:48 18597 0 100 1006 0 -12 29 ) 172.16.0.1 96 172.16.0.17 95 4 known Originator(s), averages: 96 ( 99 0 100 1009 1 -5 11 )

Last hna output -------------------- BatMan-eXp 0.3-alpha (compatibility version 10) ! Originator Announced networks HNAs: network/netmask or interface/IF (B:blocked)... 10.12.0.1 195.42.115.56/32 104.61.0.0/16 105.61.0.0/16 106.61.0.0/16 172.16.0.1/IF 194.26.180.0/24 77.87.48.0/21 10.2.0.0/16 191.161.0.0/16 104.32.0.0/12 104.0.0.0/11 104.64.0.0/10 104.128.0.0/ 9 10.126.0.0/16 10.124.0.0/15 10.120.0.0/14 10.112.0.0/13 10.12.0.17 172.16.0.17/IF 10.12.10.1 10.12.10.0/28 172.16.10.1/IF 10.12.8.1 10.12.8.0/28 172.16.8.1/IF

last service output: ------------------- BatMan-eXp 0.3-alpha (compatibility version 10) ! Originator Announced services ip:port:seqno ... 10.12.0.1 104.61.79.20:80:1 104.61.119.111:21:1 104.61.119.222:21:1 104.61.232.34:80:1 104.61.230.188:80:1 104.61.230.193:80:1

ip addr: ------------- 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:13:10:30:00:fd brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:13:10:30:00:ff brd ff:ff:ff:ff:ff:ff inet 10.12.10.17/8 brd 10.255.255.255 scope global eth1 4: br0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether 00:13:10:30:00:fd brd ff:ff:ff:ff:ff:ff inet 192.168.1.4/24 brd 192.168.1.255 scope global br0 5: vlan0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc noqueue link/ether 00:13:10:30:00:fd brd ff:ff:ff:ff:ff:ff 6: vlan1: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc noqueue link/ether 00:13:10:30:00:fd brd ff:ff:ff:ff:ff:ff inet 192.168.178.25/24 brd 192.168.178.255 scope global vlan1 7: tbb: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether 00:ff:15:bf:0b:c0 brd ff:ff:ff:ff:ff:ff inet 172.16.10.17/16 brd 172.16.255.255 scope global tbb 17: tap1: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:ff:50:70:1e:1d brd ff:ff:ff:ff:ff:ff 154: bat0: <POINTOPOINT,MULTICAST,NOARP,UP> mtu 1471 qdisc pfifo_fast qlen 10 link/[65534] inet 10.12.10.17/32 scope global bat0 155: tap0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:ff:15:bf:0b:c0 brd ff:ff:ff:ff:ff:ff

bridges: -------------- bridge name bridge id STP enabled interfaces br0 8000.0013103000fd no vlan0 tap1 tbb 8000.00ff15bf0bc0 yes tap0

/stephan

--------------------------------------- Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

Stephan Enderlein (Freifunk Dresden)

10:22 a.m.

Hi,

have found something strange. the WRT54GL (see previous message) still has WRT54GS(hanging batmand) in its list. tcpdump shows that no packets is sent out of interface eth1 and tbb (ethernet bridge with vpn interface added).

batman output of WRT54GL (working batmand) ----------------------------------------------- BatMan-eXp 0.3-alpha, IF eth1 10.12.10.1, LinkWindowSize 100, PathWindSize 100, OGI 1000ms, currSeqno 64649, UT 0:12:41:09, CPU 0/1000, IntTime 45669714 Neighbor outgoingIF bestNextHop brc (~rcvd knownSince lseq lvld rid nid ) [ viaIF RTQ RQ TQ].. 10.12.10.17 eth1 10.12.10.17 86 ( 100 0:12:41:08 34780 21317 1 0 ) [ eth1 0 79 0]

Originator outgoingIF bestNextHop brc (~rcvd knownSince lseq lvld pws ogi cpu hop change ) alternativeNextHops brc ... 10.12.10.17 eth1 10.12.10.17 86 ( 100 0:12:41:08 34780 21317 100 1003 0 -12 1 ) 1 known Originator(s), averages: 86 ( 100 21317 100 1003 0 -12 1 )

But the vserver that is connected via tbb (vpn) does not get any OGM from WRT54GS. So the problem has something to do with the tbb interface (ethernet bridge). Both router have the second interface (tbb) added to the parameters of batmand. The difference is that the WRT45GL has not interface added to the bridge(tbb).

tcpdump on bat0 shows that there is a gateway connection to the different internet router. but the following message is sent very fast (100 per seconds or so - haven't measured) tcpdump -tni bat0 IP 10.12.10.17.4306 > 10.12.8.1.4306: UDP, length 1469 IP 10.12.10.17 > 10.12.8.1: ip-proto-17 ... ...

After hard-killing the batmand process and restart the fast send messages on bat0 are not sent anymore and I have the following gateway output: Originator bestNextHop # => 10.12.8.1 172.16.8.1 81, gw_class 35 - 1024KBit/512KBit, reliability: 0, supported tunnel types -, 1WT 10.12.0.1 172.16.0.1 82, gw_class 17 - 256KBit/64KBit, reliability: 0, supported tunnel types -, 1WT

Perhaps this gives a more view of the situation and why there are so many UDP packets send on bat0.

/stephan

Axel Neumann

11 Dec 11 Dec

10:54 a.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs

Hi,

On Dienstag 09 Dezember 2008, Stephan Enderlein (Freifunk Dresden) wrote:

...

Another question: in previous versions I have seen that if two batmand announce the same HNA ip (-a) one of the batmand are ignored and complete ignored any batmand traffic. As result the node was removed from any batmand list and was not reachable anymore.

And it should be also the current behavior of bmx. The background is:

BMX currently does NOT support anycast routing.

Therefore in practical terms, if two nodes are announcing the same IP it is effectively an IP doubler. Such a scenario triggers the dublicate-address detection which results in ignoring the younger of the two nodes announcing this IP.

With the batman routing algorithm it is indeed difficult to do a consistent anycast routing. Therefore I've decided to protect against casual duplicate address announcements with the described behavior. Older versions (2007 and before) tended to have chaotic routing entries due to duplicated entries or removals of HNA routing entries.

When dynamically adding an HNA (using e.g. bmxd -ca 1.2.3.4/32) the daemon checks if other nodes are already announcing this specific HNA. If this is the case the announcement is rejected and a warning should be given in debuglevel 3. You can always inspect current announcements from other nodes using debug-level 9. This debug-level also differentiates between announced interfaces (e.g. 1.2.3.4/IF) and networks (e.g. 1.2.3.4/32). ( the idea for the future was, that network announcements should become anycast announcements )

Indeed it is a problem when a daemon is started from the beginning with a duplicate HNA announcement. In this situation the daemon can hardly check if the network is already announced. Then, only his neighboring nodes are aware of the IP doubler. They will ignore everything from this node and cause a warning in debug level 0.

hope this clarifies a bit.

/axel

Freifunk

2:01 p.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs

Hi Axel,

thanks for your description. Would it be better to only ignore the HNA that is already received from another node instead of ignoring the node completely? I currently use one node that is connected to icvpn (inter-city vpn) that connects different freifunk cities together. The network runs bgp and allows to setup a second server that also initiates a connection to the icvpn. both server would then announce the same ip addresses. In this case the batman notes only should ignore the HNA if they are injected by different icvpn servers and keep the server still in network.

Bye stephan

Axel Neumann schrieb:

...

Hi,

On Dienstag 09 Dezember 2008, Stephan Enderlein (Freifunk Dresden) wrote:

...

Another question: in previous versions I have seen that if two batmand announce the same HNA ip (-a) one of the batmand are ignored and complete ignored any batmand traffic. As result the node was removed from any batmand list and was not reachable anymore.

And it should be also the current behavior of bmx. The background is:

BMX currently does NOT support anycast routing.

Therefore in practical terms, if two nodes are announcing the same IP it is effectively an IP doubler. Such a scenario triggers the dublicate-address detection which results in ignoring the younger of the two nodes announcing this IP.

With the batman routing algorithm it is indeed difficult to do a consistent anycast routing. Therefore I've decided to protect against casual duplicate address announcements with the described behavior. Older versions (2007 and before) tended to have chaotic routing entries due to duplicated entries or removals of HNA routing entries.

When dynamically adding an HNA (using e.g. bmxd -ca 1.2.3.4/32) the daemon checks if other nodes are already announcing this specific HNA. If this is the case the announcement is rejected and a warning should be given in debuglevel 3. You can always inspect current announcements from other nodes using debug-level 9. This debug-level also differentiates between announced interfaces (e.g. 1.2.3.4/IF) and networks (e.g. 1.2.3.4/32). ( the idea for the future was, that network announcements should become anycast announcements )

Indeed it is a problem when a daemon is started from the beginning with a duplicate HNA announcement. In this situation the daemon can hardly check if the network is already announced. Then, only his neighboring nodes are aware of the IP doubler. They will ignore everything from this node and cause a warning in debug level 0.

hope this clarifies a bit.

/axel _______________________________________________ B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Axel Neumann

12 Dec 12 Dec

10:46 a.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs

Hi, On Donnerstag 11 Dezember 2008, Freifunk wrote:

...

Hi Axel,

thanks for your description. Would it be better to only ignore the HNA that is already received from another node instead of ignoring the node completely?

Sounds reasonable. What were the consideratios from those days - there were a number of reasons for the current approach but meanwhile some things have changed.

1. The hope to find a consistent anycast routing approach soon which would eliminate the duplicate HNA problem.

2. Secondary interfaces were conceptually announced in the same way as HNAs. The signaling of secondary-interface information to all other nodes was essential for many scenarios. This is because traffic originated and leaving the node via a secondary interface usually had the ip address of the secondary interface as src-address. Though, corresponding nodes could not reply if not having a correct HNA route to the originating src. This has changed some time ago. Now the bmx daemon sets the ip of the primary interface as preferred src address for all interfaces, making the interface announcement non-obligatory. You can see this by looking at the default bmx routing table ip r ls t 64 which should always show the primary-interface ip as src address.

3. I think there were theoretical scenarios for routing loops if the routing entries in different nodes for a given HNAs point to different destination nodes. But currently I can not remember.

so far, /axel

...

I currently use one node that is connected to icvpn (inter-city vpn) that connects different freifunk cities together. The network runs bgp and allows to setup a second server that also initiates a connection to the icvpn. both server would then announce the same ip addresses. In this case the batman notes only should ignore the HNA if they are injected by different icvpn servers and keep the server still in network.

Bye stephan

Axel Neumann schrieb:

...
Hi,

On Dienstag 09 Dezember 2008, Stephan Enderlein (Freifunk Dresden) wrote:

...

Another question: in previous versions I have seen that if two batmand announce the same HNA ip (-a) one of the batmand are ignored and complete ignored any batmand traffic. As result the node was removed from any batmand list and was not reachable anymore.

And it should be also the current behavior of bmx. The background is:

BMX currently does NOT support anycast routing.

Therefore in practical terms, if two nodes are announcing the same IP it is effectively an IP doubler. Such a scenario triggers the dublicate-address detection which results in ignoring the younger of the two nodes announcing this IP.

With the batman routing algorithm it is indeed difficult to do a consistent anycast routing. Therefore I've decided to protect against casual duplicate address announcements with the described behavior. Older versions (2007 and before) tended to have chaotic routing entries due to duplicated entries or removals of HNA routing entries.

When dynamically adding an HNA (using e.g. bmxd -ca 1.2.3.4/32) the daemon checks if other nodes are already announcing this specific HNA. If this is the case the announcement is rejected and a warning should be given in debuglevel 3. You can always inspect current announcements from other nodes using debug-level 9. This debug-level also differentiates between announced interfaces (e.g. 1.2.3.4/IF) and networks (e.g. 1.2.3.4/32). ( the idea for the future was, that network announcements should become anycast announcements )

Indeed it is a problem when a daemon is started from the beginning with a duplicate HNA announcement. In this situation the daemon can hardly check if the network is already announced. Then, only his neighboring nodes are aware of the IP doubler. They will ignore everything from this node and cause a warning in debug level 0.

hope this clarifies a bit.

/axel _______________________________________________ B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Stephan Enderlein (Freifunk Dresden)

11:51 p.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs

Hi,

I'm not so deep involved in batman routing to find a solution. I hope you can find a way. But for now it is not so important. But if one node announces a HNA and a different node that just has fun to "turn off" this node can simply send the same HNA. If you say the first HNA is the right one, then what happens when this node gets the forced disconnection after 24 hours by its internet provider?

I think it is difficult to find a solution for this. The best is to keep all nodes active but kill the HNA if not reachable?

/Stephan

Axel Neumann

17 Dec 17 Dec

8:14 p.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs

Hi,

On Samstag 13 Dezember 2008, Stephan Enderlein (Freifunk Dresden) wrote:

...

Hi,

I'm not so deep involved in batman routing to find a solution. I hope you can find a way. But for now it is not so important. But if one node announces a HNA and a different node that just has fun to "turn off" this node can simply send the same HNA. If you say the first HNA is the right one, then what happens when this node gets the forced disconnection after 24 hours by its internet provider?

Theoretically, if the node can reestablish a new connection after its forced disconnection within the dad timeout (100secs by default) then it should not be kicked out. But, the preliminary for this is that: the node must re-appear using the same primary IP for its primary interface and continuing with the foreseen sequence-number range.

best, axel

...

I think it is difficult to find a solution for this. The best is to keep all nodes active but kill the HNA if not reachable?

/Stephan

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Stephan Enderlein (Freifunk Dresden)

18 Dec 18 Dec

11:11 a.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs / batmand / certificates

Hi Axel,

...

Theoretically, if the node can reestablish a new connection after its forced disconnection within the dad timeout (100secs by default) then it should not be kicked out. But, the preliminary for this is that: the node must re-appear using the same primary IP for its primary interface and continuing with the foreseen sequence-number range.

The router always will have the same IP. But it can take a little time to establish the vpn connection (over internet) to connect to other batman clouds. So it is possible that the 100 seconds are reached easily. At moment the firmware has no way to set the time of the forced disconnection. But If the user are using a different router or the firmware will later support a timed disconnection, it is possible that user leave the default time. Assuming this other may connect there disturbing routers at same time to turn off attractive nodes with many connections.

All nodes must be reachable and they must ping the hna regulary (if ping is supported or check for certain services) and tell the local batmand to remove a specific hna from its internal list. But this is also not secure because the bad guy may redirect such requests and pretend the IP is reachable.

I think there must be a central server that collects "HNA requests" if they are valid, the batman node that has requested to propagate a HNA can add it. For this batman should support such a procedure internally perhaps like the visualisation server, else the bad guy may simply add hna per command line. It should be possible to generate two versions of batmand, one that acts as HNA server (only few authenticated servers) and one as HNA client (requesting to publish HNA and build in in all nodes).

To avoid running a faked batmand client to disturb a mesh, batman should support certificates to protect its OGM and other packets. Is there a way to make a batman network unique by using certificates? batman should always send signed OGM and other batman packets and also check for the correctness on receiption.

For HNA authentication and signed batman traffic you may use cacert.org. If someone then tries to disturb a batman network by setting up its own HNA server, you just may look into the certificate to get the user.

Cheers Stephan

--------------------------------------- Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

Axel Neumann

19 Dec 19 Dec

10:15 a.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs / certificates

HI,

I like brainstorming like this. We wanted batmand (and especially its core routing algorithm) to be decentral and simple. So no central point of control/failure and therefore also no HNA server. Of course there are many potential attack vectors in a community mesh and probably there will always be until you completely restrict the access. Therefore IMHO the preferable security to be solved should be:

- detect and protect against (usually accidental) misconfigurations like duplicate addresses.

- find mechanisms to limit the impact of denial of service or other attacks to the local environment (neighborhood).

Certificates and signatures might be a theoretical solution. I am not a security expert but many people have stated that frequent signature validation (like every OGM) will definitely exceed the cpu performance of the small embedded devices we use for our networks.

On Donnerstag 18 Dezember 2008, Stephan Enderlein (Freifunk Dresden) wrote:

...

...
Theoretically, if the node can reestablish a new connection after its forced disconnection within the dad timeout (100secs by default) then it should not be kicked out. But, the preliminary for this is that: the node must re-appear using the same primary IP for its primary interface and continuing with the foreseen sequence-number range.

The router always will have the same IP. But it can take a little time to establish the vpn connection (over internet) to connect to other batman clouds. So it is possible that the 100 seconds are reached easily. At moment the firmware has no way to set the time of the forced disconnection. But If the user are using a different router or the firmware will later support a timed disconnection, it is possible that user leave the default time. Assuming this other may connect there disturbing routers at same time to turn off attractive nodes with many connections.

You can tweak the dulicate address timeout detection using --dad-timeout .

Because the duplicate address detection is working based on expected sequence numbers you can avoid being ignored by other nodes after a restart by correcting your initial sequence number to a number accepted by other nodes using --initial-seqno.

ciao, axel

...

All nodes must be reachable and they must ping the hna regulary (if ping is supported or check for certain services) and tell the local batmand to remove a specific hna from its internal list. But this is also not secure because the bad guy may redirect such requests and pretend the IP is reachable.

I think there must be a central server that collects "HNA requests" if they are valid, the batman node that has requested to propagate a HNA can add it. For this batman should support such a procedure internally perhaps like the visualisation server, else the bad guy may simply add hna per command line. It should be possible to generate two versions of batmand, one that acts as HNA server (only few authenticated servers) and one as HNA client (requesting to publish HNA and build in in all nodes).

To avoid running a faked batmand client to disturb a mesh, batman should support certificates to protect its OGM and other packets. Is there a way to make a batman network unique by using certificates? batman should always send signed OGM and other batman packets and also check for the correctness on receiption.

For HNA authentication and signed batman traffic you may use cacert.org. If someone then tries to disturb a batman network by setting up its own HNA server, you just may look into the certificate to get the user.

Cheers Stephan

Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Stephan Enderlein (Freifunk Dresden)

11:06 a.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs / certificates

Hi,

...

I like brainstorming like this.

me too.

...

We wanted batmand (and especially its core routing algorithm) to be decentral and simple. So no central point of control/failure and therefore also no HNA server.

Perhaps there is a different solution. What if everybody may broadcast their HNA like batman is currently working and batmand get a list of router ip from which HNA is accepted? The bad-guy has normally no way to modify the firmware of other routers and can not tell the batmand to accept its faulty HNA. In this case batman can be updated requlary by cron-job and needs only check HNA against it list. A positiv and negativ list should be possible. Perhaps the list may contain network ranges. (hcl = hna control list)

the firmware of the router may request the list from a server. In case a non accepted hna is received, batmand may completely ignore the node, that is injecting invalid HNA. When I understand you right, batmand currently ignores nodes completely that are sending the same HNA?

/stephan

--------------------------------------- Dipl.Informatiker(FH) Stephan Enderlein Freifunk Dresden

Stephan Enderlein (Freifunk Dresden)

11:19 a.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs / certificates

Hi again,

is there a way to set a TTL value for each hna that is different from OGM TTL? If I assume that an HNA internet host is reachable via two nodes (e.g. running icvpn - bgp) batmand currently ignores one of this hna and also the node and its traffic (right?).

What if we use the ttl value as metric to decide which hna is used? In this case both nodes are still present in network but you don't have a address conflict. Batman should not accept HNA that belongs to the ipranges of the batman network. So a node with ip 10.12.0.1 can not send a HNA with 10.12.10.17 and disturbing the routing. Perhaps batmand already checks this?

/stephan

Resul Cetin

24 Dec 24 Dec

6:24 p.m.

New subject: [B.A.T.M.A.N.] Testing Batman

Hii,

I will make some test with BATMAN. My test scenario is very simple, it consists at the moment by a router and a Laptop.

My problem is, I cann't run Iperf to test the network. The ping response is also to high.

I start batmand on my laptop by " sudo batmand wlan0"

The Iperf test are broken after only 10-15 seconds...Here is my Iperf output: "[ 3] local 11.1.1.23 port 49672 connected with 129.217.186.46 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 5.0 sec 640 KBytes 1.05 Mbits/sec [ 3] 5.0-10.0 sec 640 KBytes 1.05 Mbits/sec [ 3] 10.0-15.0 sec 640 KBytes 1.05 Mbits/sec write2 failed: Network is down [ 3] 0.0-15.9 sec 1.98 MBytes 1.05 Mbits/sec [ 3] Sent 1415 datagrams [ 3] WARNING: did not receive ack of last datagram after 10 tries."

The ping results are: "64 bytes from 11.1.1.3: icmp_seq=1377 ttl=64 time=2.09 ms 64 bytes from 11.1.1.3: icmp_seq=1378 ttl=64 time=2.10 ms 64 bytes from 11.1.1.3: icmp_seq=1379 ttl=64 time=6.69 ms 64 bytes from 11.1.1.3: icmp_seq=1380 ttl=64 time=9.33 ms 64 bytes from 11.1.1.3: icmp_seq=1381 ttl=64 time=12.07 ms 64 bytes from 11.1.1.3: icmp_seq=1382 ttl=64 time=27.9 ms 64 bytes from 11.1.1.3: icmp_seq=1384 ttl=64 time=5.16 ms "

Can anybody help me to fix this problem... I didn't give a HNA on my router, could this be problem ?

I would be very appreciate, if anybody could help me..

greetings, E

Alexander Morlang

6 Jan 6 Jan

1:57 p.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs / certificates

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Axel Neumann schrieb:

...

HI,

I like brainstorming like this. We wanted batmand (and especially its core routing algorithm) to be decentral and simple. So no central point of control/failure and therefore also no HNA server. Of course there are many potential attack vectors in a community mesh and probably there will always be until you completely restrict the access. Therefore IMHO the preferable security to be solved should be:

detect and protect against (usually accidental) misconfigurations like

duplicate addresses.

sure, a duplicate address is something the routingprotocoll has to detect and to react on, but: duplicate HNA are very importand and widely accepted in the internet community, they are called anycast and are a vital instrument in network design and deployment.

as an example, anycast ist used for dns root servers, 6to4 tunnel and many other usecases.

i am still not understanding why you are discussing about removing such important thing as anycast.

anycast is a way to use distributed services, as you can announce an anycast address on every node, providing a specific service and packets will get routed to the nearest service provider.

...

find mechanisms to limit the impact of denial of service or other attacks to

the local environment (neighborhood).

Gruss, Alex

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkljYz0ACgkQhx2RbV7T5aESngCgm0gopTcK+C17sHB29nz4jfsY 5JcAmgIS2EXnvL37QfFU/mAxnBRAQDMe =nJZR -----END PGP SIGNATURE-----

Axel Neumann

15 Jan 15 Jan

2:24 p.m.

New subject: [B.A.T.M.A.N.] dublicate HNAs / certificates

Hi,

On Dienstag 06 Januar 2009, Alexander Morlang wrote:

...

Axel Neumann schrieb:

...
We wanted batmand (and especially its core routing algorithm) to be decentral and simple. So no central point of control/failure and therefore also no HNA server. Of course there are many potential attack vectors in a community mesh and probably there will always be until you completely restrict the access. Therefore IMHO the preferable security to be solved should be:

detect and protect against (usually accidental) misconfigurations like

duplicate addresses.

sure, a duplicate address is something the routingprotocoll has to detect and to react on, but: duplicate HNA are very importand and widely accepted in the internet community, they are called anycast and are a vital instrument in network design and deployment.

as an example, anycast ist used for dns root servers, 6to4 tunnel and many other usecases.

i am still not understanding why you are discussing about removing such important thing as anycast.

I think nobody wants to remove it. I wanted to point out that real anycast routing has never been supported by batman/bmx and that our features for HNA should NOT be confused with anycast routing. The problem is that the concept of anycast-routing does not easily fit into the batman routing algorithm which relies on a single-source of originator messages (OGMs) for any given destination.

I agree that the lack of anycast routing support is a problem and not a feature. Especially when talking about quagga/zebra like route exchange between different autonomous systems.

ciao, axel

...

anycast is a way to use distributed services, as you can announce an anycast address on every node, providing a specific service and packets will get routed to the nearest service provider.

...

find mechanisms to limit the impact of denial of service or other

attacks to the local environment (neighborhood).

<removed>

Gruss, Alex _______________________________________________ B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Resul Cetin

17 Dec 17 Dec

12:19 p.m.

New subject: [B.A.T.M.A.N.] Can't build a mesh network

Hiii,

I'm a beginner regarding mesh networks. At the moment, I have a problem by building a mesh network with batman between 2 Linksys and a Laptop. I get a ping response on the router side and from my Laptop side. There is a connection... ok..

But I can't run batmand to establish a mesh network...

I gave the command" sudo batmand eth1 wlan0:test" to build the connection and then this here"sudo batmand -c -d 1" but the second command gives this error message: "Error - can't connect to unix socket '/var/run/batmand.socket': No such file or directory ! Is batmand running on this host ?"

My batman version is" B.A.T.M.A.N. 0.2 (compability version 3)"

Can anybody help me to fix this problem ?

greetings, R

Resul Cetin

1:57 p.m.

New subject: [B.A.T.M.A.N.] Can't build a mesh network

Hello people...

its me again... I fixed my problem, that I mentioned in the previous message. I choosed the wrong interface, this was the mistake...

But I have still a problem...

After maybe 10 minutes, the connection is broken between the Laptop and the Routers...

Has anybody experience with this issue...

Greetings, R

...

Hiii,

I'm a beginner regarding mesh networks. At the moment, I have a problem by building a mesh network with batman between 2 Linksys and a Laptop. I get a ping response on the router side and from my Laptop side. There is a connection... ok..

But I can't run batmand to establish a mesh network...

I gave the command" sudo batmand eth1 wlan0:test" to build the connection and then this here"sudo batmand -c -d 1" but the second command gives this error message: "Error - can't connect to unix socket '/var/run/batmand.socket': No such file or directory ! Is batmand running on this host ?"

My batman version is" B.A.T.M.A.N. 0.2 (compability version 3)"

Can anybody help me to fix this problem ?

greetings, R

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

elektra

5:46 p.m.

New subject: [B.A.T.M.A.N.] Can't build a mesh network

Hi -

sure, we have. Ad-Hoc mode of 802.11 is broken by design. Keywords: IBSS merges and time stamps in beacons. There are some hacks which work exclusively for the Madwifi driver with the patches of a recent Openwrt Kamikaze developer version. You can use the ad-hoc demo mode or 'real' IBSS mode with the option "nosbeacon" when creating the VAP with

"wlanconfig ath0 create wlandev wifi0 wlanmode adhoc nosbeacon"

This is the only option for a PC at the moment. Works rock solid even in the Berlin Freifunk mesh with ~500 ad-hoc interfaces where MAC timestamps permanently jump back and forth.

Closed source binary driver for Linux 2.4 for Broadcom also works - mostly, there have been reports about issues in conjuction with patched Madwifi versions (software generated timestamps are not precise enough and can confuse the Broadcom driver) These issues can be avoided by using ah-demo mode where you fix the IBSS-ID.

Cheers, elektra

...

Has anybody experience with this issue...

Greetings, R

...
Hiii,

I'm a beginner regarding mesh networks. At the moment, I have a problem by building a mesh network with batman between 2 Linksys and a Laptop. I get a ping response on the router side and from my Laptop side. There is a connection... ok..

But I can't run batmand to establish a mesh network...

I gave the command" sudo batmand eth1 wlan0:test" to build the connection and then this here"sudo batmand -c -d 1" but the second command gives this error message: "Error - can't connect to unix socket '/var/run/batmand.socket': No such file or directory ! Is batmand running on this host ?"

My batman version is" B.A.T.M.A.N. 0.2 (compability version 3)"

Can anybody help me to fix this problem ?

greetings, R

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

B.A.T.M.A.N mailing list B.A.T.M.A.N@open-mesh.net https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n

Marek Lindner

18 Dec 18 Dec

6:32 a.m.

New subject: [B.A.T.M.A.N.] Can't build a mesh network

Hi,

...

its me again... I fixed my problem, that I mentioned in the previous message. I choosed the wrong interface, this was the mistake...

ok, thanks for letting us know.

...

But I have still a problem...

After maybe 10 minutes, the connection is broken between the Laptop and the Routers...

Has anybody experience with this issue...

As Elektra mentioned this problem might be related to the wifi layer itself and not to batman. You can try to connect to your routers with your notebook without using batman (configuring the IPs and set the routing table entries manually). If the connection still brakes its a wifi problem.

Btw, you should consider upgrading to batman 0.3 (0.2 is quite outdated). Version 0.3 is our current stable. Also, expect the 0.3.1 release soon.

Marek

5865

Age (days ago)

5909

Last active (days ago)

b.a.t.m.a.n@lists.open-mesh.org

20 comments

8 participants

tags (0)

participants (8)

Alexander Morlang
Axel Neumann
Axel Neumann
elektra
Freifunk
Marek Lindner
Resul Cetin
Stephan Enderlein (Freifunk Dresden)