Before Adding radios to my setup I connected to computers with three NICs each. I added all three interfaces to the mesh interface bat0 on each. I then run iperf across it and all the traffic seems to go on one interface. I run iperf3 with -p 4 so there are multiple streams. Changing it to bonding does not seem to change the behavior. batctl o - shows all three interfaces batctl n - shows three interfaces -This I thought seemed odd as its one neighbor across three links batctl tg - shows all clients Via one address
If anyone can point me at what to look at next or what might be wrong would help.
I am using BATMAN_V version 2019.4 in kernel 5.4.68.
On Thursday, September 9, 2021 10:09:39 PM CEST brian.edmisten@viasat.com wrote:
Before Adding radios to my setup I connected to computers with three NICs each. I added all three interfaces to the mesh interface bat0 on each. I then run iperf across it and all the traffic seems to go on one interface. I run iperf3 with -p 4 so there are multiple streams. Changing it to bonding does not seem to change the behavior. batctl o - shows all three interfaces batctl n - shows three interfaces -This I thought seemed odd as its one neighbor across three links batctl tg - shows all clients Via one address
If anyone can point me at what to look at next or what might be wrong would help.
I am using BATMAN_V version 2019.4 in kernel 5.4.68.
Hi Brian,
can you perhaps post the output of those commands?
If bonding works, it would even spread one iperf stream among the multiple links. For bonding to work, the TQ values must be on a similar level, otherwise it will not be activated.
I haven't really tried bonding with BATMAN V, you may want to try with BATMAN IV instead.
Please note that bonding will schedule the packets over the available interfaces, but will not perform any reordering on the receiver side. This can upset TCP which handles reordering as losses. In experiments with WiFi links, I often actually got degraded performance because the queue depths of the WiFi links were growing differently, therefore causing reodering ...
Cheers, Simon
Simon,
Thanks for responding. We are trying out some different solutions for bonding these radios. For scenarios BATMAN seems really well suited for the problem but we wanted to test this one and see how much work we need to put into it. I saw the same behavior with IV but I'll switch back and check on it. While its up though here is what I am seeing with V.
batctl o [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth0/00:0c:29:c5:d2:da (bat0/de:8b:cc:39:d0:69 BATMAN_V)] Originator last-seen ( throughput) Nexthop [outgoingIF] 00:0c:29:53:f8:c9 0.320s ( 10000.0) 00:0c:29:53:f8:dd [ eth2] 00:0c:29:53:f8:c9 0.320s ( 10000.0) 00:0c:29:53:f8:d3 [ eth1] * 00:0c:29:53:f8:c9 0.320s ( 10000.0) 00:0c:29:53:f8:c9 [ eth0]
batctl n [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth0/00:0c:29:c5:d2:da (bat0/de:8b:cc:39:d0:69 BATMAN_V)] IF Neighbor last-seen 00:0c:29:53:f8:c9 0.436s ( 10000.0) [ eth0] 00:0c:29:53:f8:d3 0.340s ( 10000.0) [ eth1] 00:0c:29:53:f8:dd 0.116s ( 10000.0) [ eth2]
batctl tg [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth0/00:0c:29:c5:d2:da (bat0/de:8b:cc:39:d0:69 BATMAN_V)] Client VID Flags Last ttvn Via ttvn (CRC ) * 33:33:00:00:00:02 -1 [....] ( 1) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80) * 01:00:5e:00:00:01 -1 [....] ( 2) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80) * 4e:b3:25:58:bd:15 -1 [....] ( 1) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80) * 33:33:00:00:00:01 -1 [....] ( 1) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80)
I do not directly see any of the commands outputting transmit quality I would expect the three ethernet nics to be the same but it is an assumption.
Here is the same info under IV batctl o [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth2/00:0c:29:c5:d2:ee (bat0/f2:49:86:e6:ea:aa BATMAN_IV)] Originator last-seen (#/255) Nexthop [outgoingIF] * 00:0c:29:53:f8:d3 0.976s (255) 00:0c:29:53:f8:d3 [ eth1] * 00:0c:29:53:f8:c9 0.944s (251) 00:0c:29:53:f8:c9 [ eth0] * 00:0c:29:53:f8:dd 0.368s (255) 00:0c:29:53:f8:c9 [ eth0] 00:0c:29:53:f8:dd 0.368s (255) 00:0c:29:53:f8:d3 [ eth1] 00:0c:29:53:f8:dd 0.368s (252) 00:0c:29:53:f8:dd [ eth2]
batctl n [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth2/00:0c:29:c5:d2:ee (bat0/f2:49:86:e6:ea:aa BATMAN_IV)] IF Neighbor last-seen eth0 00:0c:29:53:f8:c9 0.032s eth1 00:0c:29:53:f8:d3 0.992s eth2 00:0c:29:53:f8:dd 0.384s
batctl tg [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth2/00:0c:29:c5:d2:ee (bat0/f2:49:86:e6:ea:aa BATMAN_IV)] Client VID Flags Last ttvn Via ttvn (CRC ) * 33:33:00:00:00:02 -1 [....] ( 2) 00:0c:29:53:f8:dd ( 3) (0x9339b660) * 01:00:5e:00:00:01 -1 [....] ( 3) 00:0c:29:53:f8:dd ( 3) (0x9339b660) * 2a:78:9d:5f:f3:f6 -1 [....] ( 1) 00:0c:29:53:f8:dd ( 3) (0x9339b660) * 33:33:00:00:00:01 -1 [....] ( 2) 00:0c:29:53:f8:dd ( 3) (0x9339b660)
Thank you again, Brian
On Friday, September 10, 2021 7:59:54 PM CEST brian.edmisten@viasat.com wrote:
Simon,
Thanks for responding. We are trying out some different solutions for bonding these radios. For scenarios BATMAN seems really well suited for the problem but we wanted to test this one and see how much work we need to put into it. I saw the same behavior with IV but I'll switch back and check on it. While its up though here is what I am seeing with V.
batctl o [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth0/00:0c:29:c5:d2:da (bat0/de:8b:cc:39:d0:69 BATMAN_V)] Originator last-seen ( throughput) Nexthop [outgoingIF] 00:0c:29:53:f8:c9 0.320s ( 10000.0) 00:0c:29:53:f8:dd [ eth2] 00:0c:29:53:f8:c9 0.320s ( 10000.0) 00:0c:29:53:f8:d3 [ eth1] * 00:0c:29:53:f8:c9 0.320s ( 10000.0) 00:0c:29:53:f8:c9 [ eth0]
batctl n [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth0/00:0c:29:c5:d2:da (bat0/de:8b:cc:39:d0:69 BATMAN_V)] IF Neighbor last-seen 00:0c:29:53:f8:c9 0.436s ( 10000.0) [ eth0] 00:0c:29:53:f8:d3 0.340s ( 10000.0) [ eth1] 00:0c:29:53:f8:dd 0.116s ( 10000.0) [ eth2]
batctl tg [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth0/00:0c:29:c5:d2:da (bat0/de:8b:cc:39:d0:69 BATMAN_V)] Client VID Flags Last ttvn Via ttvn (CRC ) * 33:33:00:00:00:02 -1 [....] ( 1) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80) * 01:00:5e:00:00:01 -1 [....] ( 2) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80) * 4e:b3:25:58:bd:15 -1 [....] ( 1) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80) * 33:33:00:00:00:01 -1 [....] ( 1) 00:0c:29:53:f8:c9 ( 2) (0x6b62ac80)
I do not directly see any of the commands outputting transmit quality I would expect the three ethernet nics to be the same but it is an assumption.
Here is the same info under IV batctl o [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth2/00:0c:29:c5:d2:ee (bat0/f2:49:86:e6:ea:aa BATMAN_IV)] Originator last-seen (#/255) Nexthop [outgoingIF] * 00:0c:29:53:f8:d3 0.976s (255) 00:0c:29:53:f8:d3 [ eth1] * 00:0c:29:53:f8:c9 0.944s (251) 00:0c:29:53:f8:c9 [ eth0] * 00:0c:29:53:f8:dd 0.368s (255) 00:0c:29:53:f8:c9 [ eth0] 00:0c:29:53:f8:dd 0.368s (255) 00:0c:29:53:f8:d3 [ eth1] 00:0c:29:53:f8:dd 0.368s (252) 00:0c:29:53:f8:dd [ eth2]
batctl n [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth2/00:0c:29:c5:d2:ee (bat0/f2:49:86:e6:ea:aa BATMAN_IV)] IF Neighbor last-seen eth0 00:0c:29:53:f8:c9 0.032s eth1 00:0c:29:53:f8:d3 0.992s eth2 00:0c:29:53:f8:dd 0.384s
batctl tg [B.A.T.M.A.N. adv 2019.4, MainIF/MAC: eth2/00:0c:29:c5:d2:ee (bat0/f2:49:86:e6:ea:aa BATMAN_IV)] Client VID Flags Last ttvn Via ttvn (CRC ) * 33:33:00:00:00:02 -1 [....] ( 2) 00:0c:29:53:f8:dd ( 3) (0x9339b660) * 01:00:5e:00:00:01 -1 [....] ( 3) 00:0c:29:53:f8:dd ( 3) (0x9339b660) * 2a:78:9d:5f:f3:f6 -1 [....] ( 1) 00:0c:29:53:f8:dd ( 3) (0x9339b660) * 33:33:00:00:00:01 -1 [....] ( 2) 00:0c:29:53:f8:dd ( 3) (0x9339b660)
Hi Brian,
thank you very much for providing that output. There is only "TQ" (transmit quality) in BATMAN IV, BATMAN V uses througput based metric instead (in kbit/ s). For Ethernet, it tries to read the Ethernet speed directl, therefore you see those 10000 values.
Anyway, in BATMAN IV the values look close enough (they need to be within 50 TQ points). Just as sanity check, did you enable bonding? It is disabled by default. You can use batctl b 1 to enable it.
Unfortunately there is not really logging code for debugging, so let's try checking the settings. If that doesn't work, I could rebuild and verify ...
Cheers, Simon
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
Hi Brian,
hmm, I see. I will try to set up this scenario over the next few days and let you know. I haven't used bonding for quite a while now, but I also don't think that we had changes in the code which would break it.
Anyway, will test and let you know.
Cheers, Simon
On Tuesday, September 14, 2021 6:57:37 PM CEST Edmisten, Brian wrote:
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
Simon,
Thank you. I appreciate you looking at this.
Regards, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Wednesday, September 15, 2021 12:26 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
hmm, I see. I will try to set up this scenario over the next few days and let you know. I haven't used bonding for quite a while now, but I also don't think that we had changes in the code which would break it.
Anyway, will test and let you know.
Cheers, Simon
On Tuesday, September 14, 2021 6:57:37 PM CEST Edmisten, Brian wrote:
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
Hi Brian,
I've checked it out and can confirm your issues. The bonding code as currently implemented is trying to use a different router from each routing table towards the same originator[1]. However, with 1-hop Ethernet links those routers are always the same in all the routing tables. With WiFi that would be a bit different (I've commented out the WiFi penalty check), but even then it only alternates between two of the three interfaces.
At this point I don't have a straight forward fix for this. Will you use three Ethernet devices in your later deployment, or will those be WiFi interfaces? Also, would it be useful for you to consider bonding/teams interfaces of the Linux kernel to bond the link, and give that to batman-adv?
Cheers, Simon
[1] https://www.open-mesh.org/projects/batman-adv/wiki/Network-wide-multi-link-o...
On Wednesday, September 15, 2021 4:58:58 PM CEST Edmisten, Brian wrote:
Simon,
Thank you. I appreciate you looking at this.
Regards, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Wednesday, September 15, 2021 12:26 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
hmm, I see. I will try to set up this scenario over the next few days and let you know. I haven't used bonding for quite a while now, but I also don't think that we had changes in the code which would break it.
Anyway, will test and let you know.
Cheers, Simon
On Tuesday, September 14, 2021 6:57:37 PM CEST Edmisten, Brian wrote:
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
Simon,
The current scenario we are working with we have two different radio systems that already provide a layer 2 mesh network each. To the user they look like two Ethernet interfaces one for one wave form and one for the other. BATMAN so far is making it more stable in that the convergence of the network is much faster. There is an opportunity for 3 different radio systems, but the third vendor is unconfirmed. There was an ask to try to increase bandwidth if the nodes were known to be close together. We were trying out BATMAN's bonding features as using it could simplify our setup and reduce some of the overhead we are getting with the layers or software we are currently using.
When you say one hop, do you mean one BATMAN hop or something else? If it makes a difference my testing was direct but I think the radios will actually look like there is a switch between the nodes.
Thank you for looking in to this for me. BATMAN is doing great for our first use case.
Thank you, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Tuesday, September 21, 2021 7:16 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
I've checked it out and can confirm your issues. The bonding code as currently implemented is trying to use a different router from each routing table towards the same originator[1]. However, with 1-hop Ethernet links those routers are always the same in all the routing tables. With WiFi that would be a bit different (I've commented out the WiFi penalty check), but even then it only alternates between two of the three interfaces.
At this point I don't have a straight forward fix for this. Will you use three Ethernet devices in your later deployment, or will those be WiFi interfaces? Also, would it be useful for you to consider bonding/teams interfaces of the Linux kernel to bond the link, and give that to batman-adv?
Cheers, Simon
[1] https://www.open-mesh.org/projects/batman-adv/wiki/Network-wide-multi-link-o ptimization
On Wednesday, September 15, 2021 4:58:58 PM CEST Edmisten, Brian wrote:
Simon,
Thank you. I appreciate you looking at this.
Regards, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Wednesday, September 15, 2021 12:26 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
hmm, I see. I will try to set up this scenario over the next few days and let you know. I haven't used bonding for quite a while now, but I also don't think that we had changes in the code which would break it.
Anyway, will test and let you know.
Cheers, Simon
On Tuesday, September 14, 2021 6:57:37 PM CEST Edmisten, Brian wrote:
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
Hi Brian,
please see inline:
On Tuesday, September 21, 2021 5:41:07 PM CEST Edmisten, Brian wrote:
Simon,
The current scenario we are working with we have two different radio systems that already provide a layer 2 mesh network each. To the user they look like two Ethernet interfaces one for one wave form and one for the other. BATMAN so far is making it more stable in that the convergence of the network is much faster. There is an opportunity for 3 different radio systems, but the third vendor is unconfirmed. There was an ask to try to increase bandwidth if the nodes were known to be close together. We were trying out BATMAN's bonding features as using it could simplify our setup and reduce some of the overhead we are getting with the layers or software we are currently using.
Thank you for elaborating! Are these radios providing the same throughput? One thing I noted when doing tests back then is that the slower link will slow down the combined link, since it is sending packets in a round robin fashion. In other words, with two links, if the slow link has half the throughput of the fast link, you will not have any benefit.
When you say one hop, do you mean one BATMAN hop or something else? If it makes a difference my testing was direct but I think the radios will actually look like there is a switch between the nodes.
Whether there is a switch or not doesn't matter to BATMAN. By one hop I meant they are directly connected via Layer 2, there is no intermediate BATMAN hop acting as a relay.
Since you will be using Ethernet links and not WiFi links, BATMAN will not be able to detect that you are actually using radio links, since its only checking kernel internal structures (whether the device uses cfg80211 or wext). I'm adding a patch to generally treat interfaces like wireless interfaces from a routing perspective, this could also make a difference for your VM tests.
Thank you for looking in to this for me. BATMAN is doing great for our first use case.
Great to hear :)
Good luck using it and thank you for your feedback!
Cheers, Simon
Thank you, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Tuesday, September 21, 2021 7:16 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
I've checked it out and can confirm your issues. The bonding code as currently implemented is trying to use a different router from each routing table towards the same originator[1]. However, with 1-hop Ethernet links those routers are always the same in all the routing tables. With WiFi that would be a bit different (I've commented out the WiFi penalty check), but even then it only alternates between two of the three interfaces.
At this point I don't have a straight forward fix for this. Will you use three Ethernet devices in your later deployment, or will those be WiFi interfaces? Also, would it be useful for you to consider bonding/teams interfaces of the Linux kernel to bond the link, and give that to batman-adv?
Cheers, Simon
[1] https://www.open-mesh.org/projects/batman-adv/wiki/Network-wide-multi-link-o ptimization
On Wednesday, September 15, 2021 4:58:58 PM CEST Edmisten, Brian wrote:
Simon,
Thank you. I appreciate you looking at this.
Regards, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Wednesday, September 15, 2021 12:26 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
hmm, I see. I will try to set up this scenario over the next few days and let you know. I haven't used bonding for quite a while now, but I also don't think that we had changes in the code which would break it.
Anyway, will test and let you know.
Cheers,
Simon
On Tuesday, September 14, 2021 6:57:37 PM CEST Edmisten, Brian wrote:
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
Simon,
Thanks for clearing up the hop information.
The radios are not exactly the same throughput wise, but are similar at short distance. One is about 80% of the other.
Regards, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Wednesday, September 22, 2021 12:55 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
please see inline:
On Tuesday, September 21, 2021 5:41:07 PM CEST Edmisten, Brian wrote:
Simon,
The current scenario we are working with we have two different radio systems that already provide a layer 2 mesh network each. To the user they look like two Ethernet interfaces one for one wave form and one for
the other.
BATMAN so far is making it more stable in that the convergence of the network is much faster. There is an opportunity for 3 different radio systems, but the third vendor is unconfirmed. There was an ask to try to increase bandwidth if the nodes were known to be close together. We were trying out BATMAN's bonding features as using it could simplify our setup and reduce some of the overhead we are getting with the layers or software we are currently using.
Thank you for elaborating! Are these radios providing the same throughput? One thing I noted when doing tests back then is that the slower link will slow down the combined link, since it is sending packets in a round robin fashion. In other words, with two links, if the slow link has half the throughput of the fast link, you will not have any benefit.
When you say one hop, do you mean one BATMAN hop or something else? If it makes a difference my testing was direct but I think the radios will actually look like there is a switch between the nodes.
Whether there is a switch or not doesn't matter to BATMAN. By one hop I meant they are directly connected via Layer 2, there is no intermediate BATMAN hop
acting as a relay.
Since you will be using Ethernet links and not WiFi links, BATMAN will not be able to detect that you are actually using radio links, since its only checking kernel internal structures (whether the device uses cfg80211 or wext). I'm adding a patch to generally treat interfaces like wireless interfaces from a routing perspective, this could also make a difference for
your VM tests.
Thank you for looking in to this for me. BATMAN is doing great for our first use case.
Great to hear :)
Good luck using it and thank you for your feedback!
Cheers, Simon
Thank you, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Tuesday, September 21, 2021 7:16 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
I've checked it out and can confirm your issues. The bonding code as currently implemented is trying to use a different router from each
routing
table towards the same originator[1]. However, with 1-hop Ethernet links those routers are always the same in all the routing tables. With WiFi
that
would be a bit different (I've commented out the WiFi penalty check), but even then it only alternates between two of the three interfaces.
At this point I don't have a straight forward fix for this. Will you use three Ethernet devices in your later deployment, or will those be WiFi interfaces? Also, would it be useful for you to consider bonding/teams interfaces of
the
Linux kernel to bond the link, and give that to batman-adv?
Cheers, Simon
[1]
https://www.open-mesh.org/projects/batman-adv/wiki/Network-wide-multi-link-o
ptimization
On Wednesday, September 15, 2021 4:58:58 PM CEST Edmisten, Brian wrote:
Simon,
Thank you. I appreciate you looking at this.
Regards, Brian Edmisten
-----Original Message----- From: Simon Wunderlich [mailto:sw@simonwunderlich.de] Sent: Wednesday, September 15, 2021 12:26 AM To: b.a.t.m.a.n@lists.open-mesh.org; Edmisten, Brian Brian.Edmisten@viasat.com Subject: Re: Bonding Alternating
Hi Brian,
hmm, I see. I will try to set up this scenario over the next few days and let you know. I haven't used bonding for quite a while now, but I also don't think that we had changes in the code which would break it.
Anyway, will test and let you know.
Cheers,
Simon
On Tuesday, September 14, 2021 6:57:37 PM CEST Edmisten, Brian wrote:
Simon,
I did check again. batctl bonding responds with enabled.
Cheers, Brian Edmisten
b.a.t.m.a.n@lists.open-mesh.org