From: Simon Wunderlich simon@open-mesh.com
The default hop penalty is currently set to 15, which is applied like that for multi interface devices (e.g. dual band APs). Single band devices will still use an effective penalty of 30 (hop penalty + wifi penalty).
After receiving reports of too long paths in mesh networks with dual band APs which were fixed by increasing the hop penalty, we'd like to suggest to increase that default value in the default setting as well. We've evaluated that increase in a handful of medium sized mesh networks (5-20 nodes) with single and dual band devices, with changes for the better (shorter routes, higher throughput) or no change at all.
This patch changes the hop penalty to 30, which will give an effective penalty of 60 on single band devices (hop penalty + wifi penalty).
Signed-off-by: Simon Wunderlich simon@open-mesh.com --- soft-interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/soft-interface.c b/soft-interface.c index e783afb..9bf382d 100644 --- a/soft-interface.c +++ b/soft-interface.c @@ -757,7 +757,7 @@ static int batadv_softif_init_late(struct net_device *dev) atomic_set(&bat_priv->gw.bandwidth_down, 100); atomic_set(&bat_priv->gw.bandwidth_up, 20); atomic_set(&bat_priv->orig_interval, 1000); - atomic_set(&bat_priv->hop_penalty, 15); + atomic_set(&bat_priv->hop_penalty, 30); #ifdef CONFIG_BATMAN_ADV_DEBUG atomic_set(&bat_priv->log_level, 0); #endif
On Tue, Jun 17, 2014 at 12:16:03PM +0200, Simon Wunderlich wrote:
This patch changes the hop penalty to 30, which will give an effective penalty of 60 on single band devices (hop penalty + wifi penalty).
"batman-adv: encourage batman to take shorter routes by changing the default hop penalty" (6a12de1939281dd7fa62a6e22dc2d2c38f82734f)
This patch changed the hop penalty for single (and back then also dual) band devices from 10 to 30.
If 60 were always the correct value, why wasn't it changed from 10 to 60 back then?
If the reason was not having it measured thoroughly enough back then, why would your latest measurements be? (For instance what will prevent the hop penalty being changed again next year?)
Any data for others to check?
Cheers, Linus
Hi Linus,
On Tue, Jun 17, 2014 at 12:16:03PM +0200, Simon Wunderlich wrote:
This patch changes the hop penalty to 30, which will give an effective penalty of 60 on single band devices (hop penalty + wifi penalty).
"batman-adv: encourage batman to take shorter routes by changing the default hop penalty" (6a12de1939281dd7fa62a6e22dc2d2c38f82734f)
This patch changed the hop penalty for single (and back then also dual) band devices from 10 to 30.
that's right. Actually, at this time I was using 50 for most of my networks, so 30 was a compromise.
If 60 were always the correct value, why wasn't it changed from 10 to 60 back then?
There is no such thing as a correct value for that. The hop penalty is an empirical value derived from various experiments. The original idea was to introduce a artificial decrease of the metric for perfect networks (e.g. Ethernet) to avoid loops, but it turned out that it can also be useful to avoid route flapping between paths of different lengths, or to compensate small changes in the measurement. For example, when we placed 10 routers in one place, the routes were flapping from 1 hop (which would to be expected) to 2 hops - because of small changes in the TQ measurement. We then increased the hop penalty from 10 to 30 (or even 50) which solved that problem.
If the reason was not having it measured thoroughly enough back then, why would your latest measurements be? (For instance what will prevent the hop penalty being changed again next year?)
What is "thoroughly enough"? I didn't do "scientifical research" or write any paper on that, and don't plan to do so. It's a default value, but anyone who has a better idea can change that. It's solely based on our personal experience. I don't guarantee that we will not change it again next year, but last time we kept it for quite some time too ...
I tested it on 7 networks with 10-20 nodes each, and different type of devices. That is certainly more than last time. If you have the time/resources to do a bigger / more detailed test, feel free to do so and share your results. :)
Any data for others to check?
Nope, unfortunately these are customer networks, and I can't reveal data from that in public. But I can certainly explain how I tested: We were running Antonios throughput meter on these devices and saw some unusual slow throughput and too long paths (4 hops were 2 were possible). We then increase the hop penalty to the suggested value, and both the hopcount decreased and the throughput increase. We repeated that with other 6 networks and had either similar improvement or no change at all (since all hopcounts were already one).
Cheers, Simon
On Wed, Jun 18, 2014 at 11:21:14AM +0200, Simon Wunderlich wrote:
Any data for others to check?
Nope, unfortunately these are customer networks, and I can't reveal data from that in public.
That's very, very unfortunate... and made my hair stand on end. It clashes/undermines a little with a point I love a lot about free software... Anyways, maybe that's not something to discuss on a mailing list.
Damn it, why don't we have the stupid hop count in the measurements from the last WBM? Would have been very easy to verify with that.
Maybe we could try using the WBM to transparently find better default values in the future (again; I remember that you had made nice graphs for the decision of having interface-alternating or interface-bonding as the default back then at WBMv3 in Italy - that was awesome!)?
But I can certainly explain how I tested: We were running Antonios throughput meter on these devices and saw some unusual slow throughput and too long paths (4 hops were 2 were possible). We then increase the hop penalty to the suggested value, and both the hopcount decreased and the throughput increase. We repeated that with other 6 networks and had either similar improvement or no change at all (since all hopcounts were already one).
What mcast-rate were you using? Will this make things worse for setups with a different mcast-rate?
Cheers, Simon
Cheers, Linus
On Wed, Jun 18, 2014 at 11:21:14AM +0200, Simon Wunderlich wrote:
Any data for others to check?
Nope, unfortunately these are customer networks, and I can't reveal data from that in public.
That's very, very unfortunate... and made my hair stand on end. It clashes/undermines a little with a point I love a lot about free software... Anyways, maybe that's not something to discuss on a mailing list.
I don't quite get why you are so emotional about that. There are tons of other default settings and "heuristic" values which we determined with much less "scientific" effort - e.g. the wifi penalty, local window size, request timeout, tq global window size, broadcast number ... and nobody cried about setting these values or changing them.
I understand that it would be nicer to get all data in public, but open software is used in private and/or commercial environments as well and we should respect that these people don't want their network topology revealed. These networks are not public playground. Of course, if you want you can repeat these kind of experiments in your community or test mesh networks (weren't there some EU projects who offered that kind of stuff? :] )
Damn it, why don't we have the stupid hop count in the measurements from the last WBM? Would have been very easy to verify with that.
Very easy ...? Well, if you think so, please propose/perform/evaluate these tests in the next battlemesh. :)
Maybe we could try using the WBM to transparently find better default values in the future (again; I remember that you had made nice graphs for the decision of having interface-alternating or interface-bonding as the default back then at WBMv3 in Italy - that was awesome!)?
Yeah, that wasn't so bad, but the tests were not very extensive too - 3 devices with special hardware and setup. We could show the gains of alternating/bonding after all ... ;)
In any case, feel free to propose these kind of tests for next WBM.
But I can certainly explain how I tested: We were running Antonios throughput meter on these devices and saw some unusual slow throughput and too long paths (4 hops were 2 were possible). We then increase the hop penalty to the suggested value, and both the hopcount decreased and the throughput increase. We repeated that with other 6 networks and had either similar improvement or no change at all (since all hopcounts were already one).
What mcast-rate were you using? Will this make things worse for setups with a different mcast-rate?
The mcast rate was 18M. I don't know if it gets "worse" for different MCS rates, and it depends what we think is "worse". In general, I'd expect that the protocol chooses longer links /shorter paths, for all mcast rates.
Cheers, Simon
I will be setting up a test network with 20-25 radio's before our big roll-out. I can make the network available for testing if that would help at all.
-----Original Message----- From: B.A.T.M.A.N [mailto:b.a.t.m.a.n-bounces@lists.open-mesh.org] On Behalf Of Simon Wunderlich Sent: Thursday, June 19, 2014 12:18 PM To: b.a.t.m.a.n@lists.open-mesh.org Subject: Re: [B.A.T.M.A.N.] [PATCH] batman-adv: increase default hop penalty
On Wed, Jun 18, 2014 at 11:21:14AM +0200, Simon Wunderlich wrote:
Any data for others to check?
Nope, unfortunately these are customer networks, and I can't reveal data from that in public.
That's very, very unfortunate... and made my hair stand on end. It clashes/undermines a little with a point I love a lot about free software... Anyways, maybe that's not something to discuss on a mailing list.
I don't quite get why you are so emotional about that. There are tons of other default settings and "heuristic" values which we determined with much less "scientific" effort - e.g. the wifi penalty, local window size, request timeout, tq global window size, broadcast number ... and nobody cried about setting these values or changing them.
I understand that it would be nicer to get all data in public, but open software is used in private and/or commercial environments as well and we should respect that these people don't want their network topology revealed.
These networks are not public playground. Of course, if you want you can repeat these kind of experiments in your community or test mesh networks (weren't there some EU projects who offered that kind of stuff? :] )
Damn it, why don't we have the stupid hop count in the measurements from the last WBM? Would have been very easy to verify with that.
Very easy ...? Well, if you think so, please propose/perform/evaluate these tests in the next battlemesh. :)
Maybe we could try using the WBM to transparently find better default values in the future (again; I remember that you had made nice graphs for the decision of having interface-alternating or interface-bonding as the default back then at WBMv3 in Italy - that was awesome!)?
Yeah, that wasn't so bad, but the tests were not very extensive too - 3 devices with special hardware and setup. We could show the gains of alternating/bonding after all ... ;)
In any case, feel free to propose these kind of tests for next WBM.
But I can certainly explain how I tested: We were running Antonios throughput meter on these devices and saw some unusual slow throughput and too long paths (4 hops were 2 were possible). We then increase the hop penalty to the suggested value, and both the hopcount decreased and the throughput increase. We repeated that with other 6 networks and had either similar improvement or no change at all (since all hopcounts were already one).
What mcast-rate were you using? Will this make things worse for setups with a different mcast-rate?
The mcast rate was 18M. I don't know if it gets "worse" for different MCS rates, and it depends what we think is "worse". In general, I'd expect that the protocol chooses longer links /shorter paths, for all mcast rates.
Cheers, Simon
Hi Simon,
first of all, sorry for me getting so emotional about it. My bad, I know, it's usually not very constructive to get emotional on a mailing list.
On Thu, Jun 19, 2014 at 06:18:11PM +0200, Simon Wunderlich wrote:
Damn it, why don't we have the stupid hop count in the measurements from the last WBM? Would have been very easy to verify with that.
Very easy ...? Well, if you think so, please propose/perform/evaluate these tests in the next battlemesh. :)
What I ment was, that actually we sort of had these tests/the data :). If I remember correctly, then there were non-dynamic environment tests on the slides Axel and others presented. Where you could see the number of hops each layer 3 routing protocol took and what throughput they had. If there had been hop-count information for batman-adv too, we could have compared it to the other protocols and might have been able to deduce whether more or less hops would have resulted in higher throughput.
I will see whether I can help setting up logging data from batctl-ping next year :).
Maybe we could try using the WBM to transparently find better default values in the future (again; I remember that you had made nice graphs for the decision of having interface-alternating or interface-bonding as the default back then at WBMv3 in Italy - that was awesome!)?
Yeah, that wasn't so bad, but the tests were not very extensive too - 3 devices with special hardware and setup. We could show the gains of alternating/bonding after all ... ;)
In any case, feel free to propose these kind of tests for next WBM.
But I can certainly explain how I tested: We were running Antonios throughput meter on these devices and saw some unusual slow throughput and too long paths (4 hops were 2 were possible). We then increase the hop penalty to the suggested value, and both the hopcount decreased and the throughput increase. We repeated that with other 6 networks and had either similar improvement or no change at all (since all hopcounts were already one).
What mcast-rate were you using? Will this make things worse for setups with a different mcast-rate?
The mcast rate was 18M. I don't know if it gets "worse" for different MCS rates, and it depends what we think is "worse". In general, I'd expect that the protocol chooses longer links /shorter paths, for all mcast rates.
Thanks, 18MBit/s is a valuable information! Hm, I would kind of guess then, that everyone using an mcast rate lower or equal to 18MBit/s should be good to go, right? If nodes are using a lower mcast rate then they mostly have a lower packet loss and would need a higher hop penalty to select the same "good" path.
On the other hand, people using an mcast rate higher than 18MBit/s might want to have a hop penalty lower than 60. But right now, I'm not aware of any such mesh networks, so probably it shouldn't make things worse for other people if your measurements in your mesh were correct (which I believe they were - I never wanted to discredit your capabilities, you, Marek and Antonio are probably the best people to perform such tests in a reliable way).
Cheers, Simon
PS: Somebody noted on IRC, that it might seem that I would have a problem with commercial, non-public mesh networks. While I would certainly love it if everyone were setting up their commercial network on top of a free community mesh network Freifunk, Ninux etc., I don't have an issue if someone decides to not do that, that's everyone's free choice :). And people making a living with a commercial mesh network didn't get me emotional at all, that wasn't it.
On Tuesday 17 June 2014 12:16:03 Simon Wunderlich wrote:
From: Simon Wunderlich simon@open-mesh.com
The default hop penalty is currently set to 15, which is applied like that for multi interface devices (e.g. dual band APs). Single band devices will still use an effective penalty of 30 (hop penalty + wifi penalty).
After receiving reports of too long paths in mesh networks with dual band APs which were fixed by increasing the hop penalty, we'd like to suggest to increase that default value in the default setting as well. We've evaluated that increase in a handful of medium sized mesh networks (5-20 nodes) with single and dual band devices, with changes for the better (shorter routes, higher throughput) or no change at all.
This patch changes the hop penalty to 30, which will give an effective penalty of 60 on single band devices (hop penalty + wifi penalty).
Signed-off-by: Simon Wunderlich simon@open-mesh.com
soft-interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Applied in revision 7644650.
Thanks, Marek
b.a.t.m.a.n@lists.open-mesh.org