Running batctl 2020.1-openwrt-1 [batman-adv: 2020.1-openwrt-2]
When running a two node network with one node connected to my lan and the other operating as an access point, my network works great. I can connect clients to my batman nodes and access my LAN.
When booting up a third node. My network works for 1 minute, then breaks down. My LAN cannot ping any of the batman nodes anymore.
I keep receiving messages like this: "[ 2900.755655] br-lan: received packet on bat0 with own address as source address (addr:8c:ae:4c:db:14:5c, vlan:0)" which signifies a bridge loop I think.
My originator messages look wrong as I can see my host originator messages along with all the neigbor nodes:
root@OpenWrt:/etc/config# batctl o -n [B.A.T.M.A.N. adv 2020.1-openwrt-2, MainIF/MAC: mesh0/00:30:1a:4e:b8:26 (bat0/f2:07:f1:5f:e0:78 BATMAN_V)] Originator last-seen ( throughput) Nexthop [outgoingIF] * 00:30:1a:4e:b8:18 0.570s ( 86.7) 00:30:1a:4e:b8:2e [ mesh0] 00:30:1a:4e:b8:18 0.570s ( 21.6) 00:30:1a:4e:b8:18 [ mesh0] * 00:30:1a:4e:b8:2e 1.510s ( 212.6) 00:30:1a:4e:b8:2e [ mesh0] 00:30:1a:4e:b8:2e 1.510s ( 38.9) 00:30:1a:4e:b8:18 [ mesh0] 00:30:1a:4e:b8:26 1.510s ( 38.9) 00:30:1a:4e:b8:18 [ mesh0] * 00:30:1a:4e:b8:26 1.510s ( 108.9) 00:30:1a:4e:b8:2e [ mesh0]
root@OpenWrt:/etc/config# batctl n -n [B.A.T.M.A.N. adv 2020.1-openwrt-2, MainIF/MAC: mesh0/00:30:1a:4e:b8:26 (bat0/f2:07:f1:5f:e0:78 BATMAN_V)] IF Neighbor last-seen 00:30:1a:4e:b8:2e 0.490s ( 179.0) [ mesh0] 00:30:1a:4e:b8:18 0.380s ( 79.2) [ mesh0]
Here is my /etc/config/network: config interface 'loopback' option ifname 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0'
config globals 'globals' option ula_prefix 'fdc4:e092:8929::/48'
config interface 'lan' option type 'bridge' option proto 'static' option ipaddr '192.168.0.32' option netmask '255.255.255.0' option ip6assign '60' option gateway '192.168.0.1' list dns '8.8.8.8' option ifname 'bat0 eth0'
config interface 'nwi_mesh0' option mtu '2304' option proto 'batadv_hardif' option master 'bat0'
config interface 'bat0' option proto 'batadv' option routing_algo 'BATMAN_V' option aggregated_ogms '1' option ap_isolation '0' option bonding '0' option fragmentation '1' option gw_mode 'server' option log_level '0' option orig_interval '1000' option bridge_loop_avoidance '1' option distributed_arp_table '1' option multicast_mode '1' option network_coding '0' option hop_penalty '30' option isolation_mark '0x00000000/0x00000000'
And here is my /etc/config/wireless root@OpenWrt:/etc/config# cat wireless config wifi-device 'radio0' option type 'mac80211' option channel '36' option hwmode '11a' option path 'soc0/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0' option htmode 'VHT80'
config wifi-iface 'mesh0' option device 'radio0' option ifname 'mesh0' option network 'nwi_mesh0' option mode 'mesh' option mesh_fwding '0' option mesh_id 'batman_mesh' option encryption 'none'
config wifi-iface 'wifinet0' option device 'radio0' option mode 'ap' option ssid 'N2-Lander' option encryption 'psk2' option key 'finnjamin' option ifname 'wlan0' option network 'lan'
Any and all help is greatly appreciated
Hi Luke,
can you please describe which nodes are connected to the LAN and which are not? You say "one is connected to LAN" and the others are "operating as an access point", does that mean they are not connected to the same LAN via Ethernet?
If multiple nodes are connected and bridged to the same LAN, bridge loop avoidance should be enabled - you have that in your config, but you could double check with "batctl bl" and then "batctl bbt"/"batctl cl" (please post these tables if you think this could be connected).
You could also try disabling distributed arp table and multicast mode, just to make sure this is not shooting us in the foot here. Those optimizations are not really needed for such a small network.
Cheers, Simon
On Tuesday, July 7, 2020 9:47:31 PM CEST lavincent15@gmail.com wrote:
Running batctl 2020.1-openwrt-1 [batman-adv: 2020.1-openwrt-2]
When running a two node network with one node connected to my lan and the other operating as an access point, my network works great. I can connect clients to my batman nodes and access my LAN.
When booting up a third node. My network works for 1 minute, then breaks down. My LAN cannot ping any of the batman nodes anymore.
I keep receiving messages like this: "[ 2900.755655] br-lan: received packet on bat0 with own address as source address (addr:8c:ae:4c:db:14:5c, vlan:0)" which signifies a bridge loop I think.
My originator messages look wrong as I can see my host originator messages along with all the neigbor nodes:
root@OpenWrt:/etc/config# batctl o -n [B.A.T.M.A.N. adv 2020.1-openwrt-2, MainIF/MAC: mesh0/00:30:1a:4e:b8:26 (bat0/f2:07:f1:5f:e0:78 BATMAN_V)] Originator last-seen ( throughput) Nexthop [outgoingIF] * 00:30:1a:4e:b8:18 0.570s ( 86.7) 00:30:1a:4e:b8:2e [ mesh0] 00:30:1a:4e:b8:18 0.570s ( 21.6) 00:30:1a:4e:b8:18 [ mesh0] * 00:30:1a:4e:b8:2e 1.510s ( 212.6) 00:30:1a:4e:b8:2e [ mesh0] 00:30:1a:4e:b8:2e 1.510s ( 38.9) 00:30:1a:4e:b8:18 [ mesh0] 00:30:1a:4e:b8:26 1.510s ( 38.9) 00:30:1a:4e:b8:18 [ mesh0] * 00:30:1a:4e:b8:26 1.510s ( 108.9) 00:30:1a:4e:b8:2e [ mesh0]
root@OpenWrt:/etc/config# batctl n -n [B.A.T.M.A.N. adv 2020.1-openwrt-2, MainIF/MAC: mesh0/00:30:1a:4e:b8:26 (bat0/f2:07:f1:5f:e0:78 BATMAN_V)] IF Neighbor last-seen 00:30:1a:4e:b8:2e 0.490s ( 179.0) [ mesh0] 00:30:1a:4e:b8:18 0.380s ( 79.2) [ mesh0]
Here is my /etc/config/network: config interface 'loopback' option ifname 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0'
config globals 'globals' option ula_prefix 'fdc4:e092:8929::/48'
config interface 'lan' option type 'bridge' option proto 'static' option ipaddr '192.168.0.32' option netmask '255.255.255.0' option ip6assign '60' option gateway '192.168.0.1' list dns '8.8.8.8' option ifname 'bat0 eth0'
config interface 'nwi_mesh0' option mtu '2304' option proto 'batadv_hardif' option master 'bat0'
config interface 'bat0' option proto 'batadv' option routing_algo 'BATMAN_V' option aggregated_ogms '1' option ap_isolation '0' option bonding '0' option fragmentation '1' option gw_mode 'server' option log_level '0' option orig_interval '1000' option bridge_loop_avoidance '1' option distributed_arp_table '1' option multicast_mode '1' option network_coding '0' option hop_penalty '30' option isolation_mark '0x00000000/0x00000000'
And here is my /etc/config/wireless root@OpenWrt:/etc/config# cat wireless config wifi-device 'radio0' option type 'mac80211' option channel '36' option hwmode '11a' option path 'soc0/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0' option htmode 'VHT80'
config wifi-iface 'mesh0' option device 'radio0' option ifname 'mesh0' option network 'nwi_mesh0' option mode 'mesh' option mesh_fwding '0' option mesh_id 'batman_mesh' option encryption 'none'
config wifi-iface 'wifinet0' option device 'radio0' option mode 'ap' option ssid 'N2-Lander' option encryption 'psk2' option key 'finnjamin' option ifname 'wlan0' option network 'lan'
Any and all help is greatly appreciated
"can you please describe which nodes are connected to the LAN and which are not? You say "one is connected to LAN" and the others are "operating as an"
00:30:1a:4e:b8:26 is the only node that is connected via eth0 to my LAN. All three nodes are running mesh point and AP on mesh0 and wlan0 respectively.
"If multiple nodes are connected and bridged to the same LAN, bridge loop avoidance should be enabled - you have that in your config, but you could double check with "batctl bl" and then "batctl bbt"/"batctl cl" (please post these tables if you think this could be connected). You could also try disabling distributed arp table and multicast mode, just to make sure this is not shooting us in the foot here. Those optimizations are not really needed for such a small network."
I see. Since I only have one node connected to LAN via eth0, I do not need bla. I just tried disabling multicast, bla, and arp table. So far so good! It's working. Thank you so much for the help!
Simon,
When I enable DAT on all of my nodes, the network breaks down. With DAT disabled on all the nodes, the network works fine.
As I develop my project, I would like to take advantage of the mesh wide ARP caching feature DAT. Is there any way I can fix things to where DAT will work on my network?
Thanks, Luke
Hi Luke,
On Wed, Jul 08, 2020 at 03:26:49PM -0000, lavincent15@gmail.com wrote:
Simon,
When I enable DAT on all of my nodes, the network breaks down. With DAT disabled on all the nodes, the network works fine.
As I develop my project, I would like to take advantage of the mesh wide ARP caching feature DAT. Is there any way I can fix things to where DAT will work on my network?
Thanks, Luke
Would it be possible for you to try an older version of batman-adv, like v2019.0? There were a few new feature additions for DAT after that one.
Btw. did you also try disabling aggregation with your batman-adv 2020.1 version? That didn't make a difference, right?
(disabling aggregation for BATMAN_V in v2019.0 won't make a difference as it wasn't implemented there yet, so if you could try that with 2020.1 would be great)
Regards, Linus
On Thu, Jul 09, 2020 at 10:33:44PM +0200, Linus Lüssing wrote:
Hi Luke,
On Wed, Jul 08, 2020 at 03:26:49PM -0000, lavincent15@gmail.com wrote:
Simon,
When I enable DAT on all of my nodes, the network breaks down. With DAT disabled on all the nodes, the network works fine.
As I develop my project, I would like to take advantage of the mesh wide ARP caching feature DAT. Is there any way I can fix things to where DAT will work on my network?
Thanks, Luke
Would it be possible for you to try an older version of batman-adv, like v2019.0? There were a few new feature additions for DAT after that one.
Btw. did you also try disabling aggregation with your batman-adv 2020.1 version? That didn't make a difference, right?
(disabling aggregation for BATMAN_V in v2019.0 won't make a difference as it wasn't implemented there yet, so if you could try that with 2020.1 would be great)
Regards, Linus
Hi Luke,
Any news, especially regarding aggregation?
Some likely bug regarding the aggregation was found, a description + potential patch can be found here:
https://www.open-mesh.org/issues/413
Would be great if you could check if this is related to your issue or not.
Regards, Linus
Linus,
I have a working network with aggregated_ogms enabled and DAT disabled.I just tried disabling aggregated_ogms and the network continued to function properly. I then enabled DAT and the network continued to function properly. So it seems I just cannot have aggregated_ogms and DAT enabled at the same time.
Thanks, Luke
On Fri, Jul 24, 2020 at 03:00:33PM -0000, lavincent15@gmail.com wrote:
Linus,
I have a working network with aggregated_ogms enabled and DAT disabled.I just tried disabling aggregated_ogms and the network continued to function properly. I then enabled DAT and the network continued to function properly. So it seems I just cannot have aggregated_ogms and DAT enabled at the same time.
Thanks, Luke
Hi Luke,
Awesome, great news, that intensifies the suspicion that the issue in the aggregation code is the main cause.
Would it be possible for you to try the patch from the ticket and if this allows you to enable both DAT and aggregation?
https://www.open-mesh.org/issues/413 => https://git.open-mesh.org/batman-adv.git/commit/0115502eab54a80f2c05884efce6...
Cheers, Linus
Linus
I would love to try it out and help with the development, but unfortunately I do not have the time to do that. My internship is coming to a close, and I need to use a version I know works to provide good data.
Side note* Do you think you could provide me with a rough equation for when a node decides to use a hop instead of a direct connection? I'm particularly interested in how the nodes use the hop penalty in the equation. Does speeding up the interval increase the speediness of its decision? I'm using this in a mobile node environment and I need it to dynamically switch to the most stable connection.
Thanks, Luke
On Mon, Jul 27, 2020 at 04:28:15PM -0000, lavincent15@gmail.com wrote:
Linus
I would love to try it out and help with the development, but unfortunately I do not have the time to do that. My internship is coming to a close, and I need to use a version I know works to provide good data.
Oh, okay, good luck with the results then!
Side note* Do you think you could provide me with a rough equation for when a node decides to use a hop instead of a direct connection? I'm particularly interested in how the nodes use the hop penalty in the equation. Does speeding up the interval increase the speediness of its decision? I'm using this in a mobile node environment and I need it to dynamically switch to the most stable connection.
The metric, including the hop-penalty for BATMAN V is described here:
https://www.open-mesh.org/projects/batman-adv/wiki/Ogmv2#322-Metric-Update
Or here in the code, in this short function:
https://elixir.bootlin.com/linux/v5.7.8/source/net/batman-adv/bat_v_ogm.c#L4...
And then the algorithm will compare if the resulting throughput metric is higher via a direct connection or over another hop even with the (either) hop or half-duplex penalty applied.
Hope that helps.
Regards, Linus
b.a.t.m.a.n@lists.open-mesh.org