Hello,
I have a question about Batman V's behavior about throughput. Batman doesn't seem to calculate throughput properly so we set it manually with throughput override, but then even when actual throughput of the active interface decreases, it doesn't switch to the other interface because it only considers the overriden value.
It only switches when the active interface stops receiving OGM's completely. I think if throughput was calculated properly this wouldn't be a problem so i want to ask why it's the way it is. Batman already has a tool called throughout meter, shouldn't it be used to continuously check the value?
Hi,
Batman doesn't seem to calculate throughput properly
please provide details about what you mean? What is the expected vs calculated throughput and how have you determined the calculation is wrong?
so we set it manually with throughput override, but then even when actual throughput of the active interface decreases, it doesn't switch to the other interface because it only considers the overriden value.
Please describe the steps how this problem can be reproduced.
Batman already has a tool called throughout meter, shouldn't it be used to continuously check the value?
A number of patches were proposed to integrate the throughput meter as fallback when the throughput can not be determined via other means:
https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-...
There were a few open issues that require further work. If you are interested in spending time on this subject, I am happy to provide assistance.
Cheers, Marek
We have two modems for each node and in one of them, expected throughput should be about 6 Mb/s for example, and in the other one it should be about 30 Mb/s. By using iperf and also throughput meter I can see that it's the case. But when they are added to batman with batctl if add, after typing batctl o, I see that the throughput values in both interfaces are 10000 instead.
I looked at the interfaces with ethtool and the speed is 10000 Mb/s there for both too which is how batman must be measuring the throughput but this isn't good because it doesn't reflect the actual speed. If we use throughput override, it's fine at first but one of the modems has a shorter range so in our test where two nodes move away from each other, actual throughput gets decreased due to losses but batman still chooses the same interface due to the overriden value.
Basically I would prefer batman being able to change measured throughput dynamically.
Hi,
We have two modems for each node and in one of them, expected throughput should be about 6 Mb/s for example, and in the other one it should be about 30 Mb/s. By using iperf and also throughput meter I can see that it's the case. But when they are added to batman with batctl if add, after typing batctl o, I see that the throughput values in both interfaces are 10000 instead.
I looked at the interfaces with ethtool and the speed is 10000 Mb/s there for both too which is how batman must be measuring the throughput
correct. If the underlying interface provides a link speed via ethtool, batman uses the ethtool API to get the throughput value.
If we use throughput override, it's fine at first but one of the modems has a shorter range so in our test where two nodes move away from each other, actual throughput gets decreased due to losses but batman still chooses the same interface due to the overriden value.
That is what the manual override is meant to do. A manual value that will override all dynamically determined values.
Can you explain what type if "modem" you are talking about? It is not clear why a modem depends on range. Or are you talking about a batman mesh connecting various modems? Please share the topology of your setup.
Is this somehow related to your earlier statement: "[..] but then even when actual throughput of the active interface decreases, it doesn't switch to the other interface because it only considers the overriden value." ?
Basically I would prefer batman being able to change measured throughput dynamically.
if I understand correctly you are changing from "Batman doesn't seem to calculate throughput properly" to "measured throughput is preferable" ? There is no calculation issue with batman v?
Cheers, Marek
I mean batman is getting the right calculation from ethtool but the problem is the value from ethtool is not preferable as throughput can drop as two nodes move away from each other. After checking the batman code, I have a better understanding. Batman's throughput calculation for wifi interfaces is probably desirable because it is using cfg80211's expected_throughput. But we are connecting custom modems to ethernet interfaces so they aren't wlan interfaces so it is using the speed value from ethtool, which isn't always accurate.
We also did tests in virtual environment and according to this commit https://git.open-mesh.org/batman-adv.git/commit/6e860b3d5e4147bafcda32bf9b3e..., ethtool link speed detection used to be disabled for such cases but got reverted since automatic measurements aren't implemented. So, is throughput_meter fallback method that is being worked on right now supposed to be the automatic measurement for cases like this? Whatever the method is, dynamically calculating throughput is a must because like I said, one of our modems have a shorter distance range so it is faster when two nodes are close but as nodes move away, there is lots of packet loss so the real throughput drops as well, but with overriden throughput value stays the same.
BATMAN_IV doesn't have that problem due to considering packet loss but it is worse due to not taking throughput into account so we can't use that neither. If there is any way to take packet loss into account on BATMAN_V that I'm not aware of I would like to learn that, but I'm guessing probably not.
I see that the last patch for tp fallback was written in 2018, has there been no more progress since then? And what are the problems with it?
On Friday, 5 April 2024 10:06:56 CEST berkay.demirci@protonmail.com wrote:
We also did tests in virtual environment and according to this commit https://git.open-mesh.org/batman-adv.git/commit/6e860b3d5e4147bafcda32bf9b3 e769926f232c5, ethtool link speed detection used to be disabled for such cases but got reverted since automatic measurements aren't implemented.
No. Batman-adv used the ethtool 'auto-negotiation' on/off state to decide whether the the ethtool throughput value should be trusted.
As the commit states, the 'auto-negotiation' state has no impact on whether the reported throughput value should be trusted. Auto-negotiation could be on or off and still the value is wrong.
dynamically calculating throughput is a must because like I said, one of our modems have a shorter distance range so it is faster when two nodes are close but as nodes move away, there is lots of packet loss so the real throughput drops as well, but with overriden throughput value stays the same.
You keep keep mentioning "modems have distance", "move away", "ethtool", etc without having explained what your setup is. Without providing details about these "modems with range" and what interface types you are talking about, nobody can really comment on your setup.
I see that the last patch for tp fallback was written in 2018, has there been no more progress since then? And what are the problems with it?
The main obstacle is time & energy to work on the tp fallback integration. Open issues were mentioned in the responses to the various patches.
f you are interested n spending time on these patches, I am happy to provide assistance.
Cheers, Marek
I attached a file that has graphs that show what is happening. Worksheets from THR5 up to THRS9 and LOWMGEN are the ones relevant and the difference between them is the period of ELP and OGM. But changing them didn't make much of a difference in the end so we can just look at THR5.
We normally have 4 routers but for the test we are simulating, we used network 1 whose throughput is 30000 kbps and network 3 whose throughput is 8000 kbps and we give those values to batman with throughput_override. Graph shows the amount of OGM and ELP packets sent by node-1 from network 1 and 3 that were able to reach node-2, also PDR (packet delivery ratio) which starts dropping heavily thanks to packet loss and then picks back up when switching to network 3. Node-2 connection NW graph shows which network is chosen by batman, it's network 1 at first, then 3 etc.
In the test scenario, two nodes move away from each other so packet loss increases over time but it increases more in network 1, and since the throughput values are overriden, batman still chooses that network based on that value. Only when OGM messages stop reaching in network 1, batman switches to network 3 and we see the PDR increasing to 1 immediately when that happens.
Basically we want batman to be able to switch earlier than that and that's why I asked about the throughput meter implementation because the batman overriden throughput value doesn't consider packet losses. Another idea we had was to manually change the throughput value via a script if packet loss increases too much or something like that, we haven't thought in detail yet. So I'm asking if you could have any suggestion that considers packet loss as well.
Otherwise, I'd also appreciate the assistance you could provide for the patches for tp fallback implementation. Does it work at all at its current state even with problems or is it not there yet?
The set up is that we have custom routers connected to ethernet ports of Ubuntu computers and in the simulation, they are set up with virtually. I don't think the details matter because without tp override ethtool is used and it just gives the maximum physical layer speed which Sven Eckelmann also mentions in the last reply in https://lists.open-mesh.org/mailman3/hyperkitty/list/b.a.t.m.a.n@lists.open-...
On Monday, 15 April 2024 10:20:20 CEST Berkay Demirci wrote:
I attached a file that has graphs that show what is happening. Worksheets from THR5 up to THRS9 and LOWMGEN are the ones relevant and the difference between them is the period of ELP and OGM.
Why have you decided to configure the ELP interval to 1s and OGM interval to 0.5s?
Cheers, Marek
Just to try different combinations to see if it made any difference, like could it maybe switch earlier. But either way, only when OGM's stop reaching node 2 in network 1, does it switch to network 3.
On Monday, 15 April 2024 10:20:20 CEST Berkay Demirci wrote:
In the test scenario, two nodes move away from each other so packet loss increases over time but it increases more in network 1, and since the throughput values are overriden, batman still chooses that network based on that value. Only when OGM messages stop reaching in network 1, batman switches to network 3 and we see the PDR increasing to 1 immediately when that happens.
Correct, the throughput override is a static value and does not adjust to a changing environment. On wireless interface the estimated throughput would be adjusted as the nodes move away from each other (batman-adv is able to read estimated throughput values from the WiFi driver).
Basically we want batman to be able to switch earlier than that and that's why I asked about the throughput meter implementation because the batman overriden throughput value doesn't consider packet losses.
Exactly, the throughput override does not consider anything other than the configured value.
Another idea we had was to manually change the throughput value via a script if packet loss increases too much or something like that, we haven't thought in detail yet. So I'm asking if you could have any suggestion that considers packet loss as well.
It seems you are attempting to simulate a wireless environment using wired devices. Wired devices typically can not "move away" from each other, hence you are running into this issue with your simulation approach.
Maybe mac80211 hwsim is an option for you? I've never used it, so can't provide specific suggestions.
Have you considered testing on a wireless testbed?
Otherwise, I'd also appreciate the assistance you could provide for the patches for tp fallback implementation. Does it work at all at its current state even with problems or is it not there yet?
Your question isn't entirely clear to me. As a first step, you'd have to rebase the tp meter patches to work on your chosen batman-adv version. How much work that might be is hard to ascertain without trying it.
Cheers, Marek
b.a.t.m.a.n@lists.open-mesh.org